Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

10-2008

Abstract

Many applications on blog search and mining often meet the challenge of handling huge volume of blog data, in which one single blog could contain hundreds or even thousands of entries. We investigate novel techniques for profiling blogs by selecting a subset of representative entries for each blog. We propose two principles for guiding the entry selection task: representativeness and diversity. Further, we formulate the entry selection task into a combinatorial optimization problem and propose a greedy yet effective algorithm for finding a good approximate solution by exploiting the theory of submodular functions. We suggest blog classification for judging the performance of the proposed entry selection techniques and evaluate their performance on a real blog dataset, in which encouraging results were obtained.

Keywords

Blog profiling, Blog classification, Entry selection

Discipline

Computer Sciences | Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

CIKM '08: Proceedings of the ACM 17th Conference on Information and Knowledge Management: Napa Valley, CA, October 2-30

First Page

1387

Last Page

1388

ISBN

9781595939913

Identifier

10.1145/1458082.1458293

Publisher

ACM

City or Country

New York

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1145/1458082.1458293

Share

COinS