Publication Type

Journal Article

Version

publishedVersion

Publication Date

7-2017

Abstract

Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel k -Sketch query that aims to find k striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the k -Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the k most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk.

Keywords

Computational journalism, news theme discovery, sequenced data, approximate algorithms

Discipline

Databases and Information Systems | Data Storage Systems

Publication

IEEE Transactions on Knowledge and Data Engineering

Volume

29

Issue

7

First Page

1398

Last Page

1411

ISSN

1041-4347

Identifier

10.1109/TKDE.2017.2685587

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Additional URL

https://doi.org/10.1109/TKDE.2017.2685587

Share

COinS