Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2018

Abstract

As an important research topic with well-recognized practical values, classification of social streams has been identified with increasing popularity with social data, such as the tweet stream generated by Twitter users in chronological order. A salient, and perhaps also the most interesting, feature of such user-generated content is its never-failing novelty, which, unfortunately, would challenge most traditional pre-trained classification models as they are built based on fixed label set and would therefore fail to identify new labels as they emerge. In this paper, we study the problem of classification of social streams with emerging new labels, and propose a novel ensemble framework, integrating an instance-based learner and a label-based learner by completely-random trees. The proposed framework can not only classify known labels in the multi-label scenario, but also detect emerging new labels and update itself in the data stream. Extensive experiments on real-world stream data set from Weibo, a Chinese micro-blogging platform, demonstrate the superiority of our approach over the state-of-the-art methods.

Keywords

Stream classification, Emerging new labels, Model update

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Research Areas

Data Science and Engineering

Publication

Advances in knowledge discovery and data mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, Australia, June 3-6, 2018: Proceedings

Volume

10937

First Page

16

Last Page

28

ISBN

9783319930336

Identifier

10.1007/978-3-319-93034-3_2

Publisher

Springer

City or Country

Cham

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1007/978-3-319-93034-3_2

Share

COinS