Publication Type
Journal Article
Version
publishedVersion
Publication Date
7-2017
Abstract
Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel k -Sketch query that aims to find k striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the k -Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the k most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk.
Keywords
Computational journalism, news theme discovery, sequenced data, approximate algorithms
Discipline
Databases and Information Systems | Data Storage Systems
Publication
IEEE Transactions on Knowledge and Data Engineering
Volume
29
Issue
7
First Page
1398
Last Page
1411
ISSN
1041-4347
Identifier
10.1109/TKDE.2017.2685587
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
FAN, Qi; LI, Yuchen; ZHANG, Dongxiang; and TAN, Kian-Lee Tan.
Discovering newsworthy themes from sequenced data: A step towards computational journalism. (2017). IEEE Transactions on Knowledge and Data Engineering. 29, (7), 1398-1411.
Available at: https://ink.library.smu.edu.sg/sis_research/3996
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TKDE.2017.2685587