The efficient processing of document streams plays animportant role in many information filtering systems. Emerging applications,such as news update filtering and social network notifications, demandpresenting end-users with the most relevant content to their preferences. Inthis work, user preferences are indicated by a set of keywords. A centralserver monitors the document stream and continuously reports to each user thetop-k documents that are most relevant to her keywords. Our objective is tosupport large numbers of users and high stream rates, while refreshing thetop-k results almost instantaneously. Our solution abandons the traditionalfrequency-ordered indexing approach. Instead, it follows an identifier-orderingparadigm that suits better the nature of the problem. When complemented with anovel, locally adaptive technique, our method offers (i) proven optimalityw.r.t. the number of considered queries per stream event, and (ii) an order ofmagnitude shorter response time (i.e., time to refresh the query results) than thecurrent state-of-the-art.
Top-k query, Continuous query, Document stream
Databases and Information Systems | Management Information Systems
Data Management and Analytics
IEEE Transactions on Knowledge and Data Engineering
Institute of Electrical and Electronics Engineers (IEEE)
U, Leong Hou; ZHANG, Junjie; MOURATIDIS, Kyriakos; and LI, Ye.
Continuous Top-k monitoring on document streams. (2017). IEEE Transactions on Knowledge and Data Engineering. 29, (5), 991-1003. Research Collection School Of Information Systems.
Available at: http://ink.library.smu.edu.sg/sis_research/3643
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.