Processing long queries against short text: Top-k advertisement matching in news stream applications
Publication Type
Journal Article
Version
publishedVersion
Publication Date
6-2017
Abstract
Many real applications in real-time news stream advertising call for efficient processing of long queriesagainst short text. In such applications, dynamic news feeds are regarded as queries to match against anadvertisement (ad) database for retrieving the k most relevant ads. The existing approaches to keywordretrieval cannot work well in this search scenario when queries are triggered at a very high frequency.To address the problem, we introduce new techniques to significantly improve search performance. First,we devise a two-level partitioning for tight upper bound estimation and a lazy evaluation scheme to delayfull evaluation of unpromising candidates, which can bring three to four times performance boosting in adatabase with 7 million ads. Second, we propose a novel rank-aware block-oriented inverted index to furtherimprove performance. In this index scheme, each entry in an inverted list is assigned a rank according toits importance in the ad. Then, we introduce a block-at-a-time search strategy based on the index scheme tosupport a much tighter upper bound estimation and a very early termination. We have conducted experimentswith real datasets, and the results show that the rank-aware method can further improve performance byan order of magnitude.
Keywords
Long queries, short text, top-k retrieval, inverted index, rank-aware partitioning
Discipline
Databases and Information Systems | Software Engineering
Research Areas
Data Science and Engineering
Publication
ACM Transactions on Information Systems
Volume
35
Issue
3
First Page
28: 1
Last Page
27
ISSN
1046-8188
Identifier
10.1145/3052772
Publisher
Association for Computing Machinery (ACM)
Citation
ZHANG, Dongxiang; LI, Yuchen; FAN, Ju; GAO, Lianli; SHEN, Fumin; and SHEN, Heng Tao.
Processing long queries against short text: Top-k advertisement matching in news stream applications. (2017). ACM Transactions on Information Systems. 35, (3), 28: 1-27.
Available at: https://ink.library.smu.edu.sg/sis_research/3995
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3052772