Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2008
Abstract
With an explosive growth of blogs, information seeking in blogosphere becomes more and more challenging. One example task is to find the most relevant topical blogs against a given query or an existing blog. Such a task requires concise representation of blogs for effective and efficient searching and matching. In this paper, we investigate a new problem of profiling a blog by choosing a set of m most representative entries from the blog, where m is a predefined number that is application-dependent. With the set of selected representative entries, applications on blogs avoid handling hundreds or even thousands of entries (or posts) associated with each blog, which are updated frequently and often noisy in nature. To guide the process of selecting the most representative entries, we propose three principles, i.e., anomaly, representativeness, and diversity. Based on these principles, a greedy yet very efficient entry selection algorithm is proposed. To evaluate the entry selection algorithms, an extrinsic evaluation methodology from document summarization research is adapted. Specifically, we evaluate the proposed entry selection algorithms by examining their blog classification accuracies. By evaluating on a number of different classification methods, our empirical results showed that comparable classification accuracy could be achieved by using fewer than 20 representative entries for each blog compared to that of engaging all entries.
Keywords
Blog profiling, Entry selection, Blog classification
Discipline
Computer Sciences | Social Media
Publication
AND '08: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data: July 2008, Singapore
First Page
55
Last Page
62
ISBN
9781605581965
Identifier
10.1145/1390749.1390759
Publisher
ACM
City or Country
New York
Citation
ZHUANG, Jinfeng; HOI, Steven C. H.; and SUN, Aixin.
On Profiling Blogs with Representative Entries. (2008). AND '08: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data: July 2008, Singapore. 55-62.
Available at: https://ink.library.smu.edu.sg/sis_research/2405
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://dx.doi.org/10.1145/1390749.1390759