Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

4-2013

Abstract

Label propagation has been studied for many years, starting from a set of nodes with labels and then propagating to those without labels. In social networks, building complete user profiles like interests and affiliations contributes to the systems like link prediction, personalized feeding, etc. Since the labels for each user are mostly not filled, we often employ some people to label these users. And therefore, the cost of human labeling is high if the data set is large. To reduce the expense, we need to select the optimal data set for labeling, which produces the best propagation result. In this paper, we proposed two algorithms for the selection of the optimal data set for labeling, which is the greedy and greedyMax algorithms according to different user input. We select the data set according to two scenarios, which are 1) finding top-K nodes for labeling and then propagating as much nodes as possible, and 2) finding a minimal set of nodes for labeling and then propagating the whole network with at least one label. Furthermore, we analyze the network structure that affects the selection and propagation results. Our algorithms are suitable for most propagation algorithms. In the experiment part, we evaluate our algorithms based on 500 networks extracted from the film-actor table in freebase according to the two different scenarios. The performance including input percentage, time cost, precision and f1-score were present in the results. And from the results, the greedyMax could achieve higher performance with a balance of precision and time cost than the greedy algorithm. In addition, our algorithm could be adaptive to the user input in a quick response.

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing | Social Media

Publication

Database Systems for Advanced Applications: 18th International Conference, DASFAA 2013, Wuhan, China, April 22-25, 2013. Proceedings, Part II

Volume

7826

First Page

194

Last Page

209

ISBN

9783642374500

Identifier

10.1007/978-3-642-37450-0_14

Publisher

Springer Verlag

City or Country

Cham

Copyright Owner and License

LARC

Additional URL

http://doi.org/10.1007/978-3-642-37450-0_14

Share

COinS