Research Collection School Of Computing and Information Systems

Multiple continuous top-K queries over data stream

Rui ZHU, Shenyang Aerospace University
Yujin JIA, Northeastern University
Xiaochun YANG, Northeastern University
Baihua ZHENG, Singapore Management UniversityFollow
Bin WANG, Northeastern University
Chuanyu ZONG, Shenyang Aerospace University

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

5-2024

Abstract

Continuous top-k" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: 0; font-size: 18.72px; font-size-adjust: none; overflow-wrap: normal; word-spacing: normal; text-wrap-mode: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">kk query over sliding window is a fundamental challenge in the domain of streaming data management. Specifically, a continuous top-k query q" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: 0; font-size: 18.72px; font-size-adjust: none; overflow-wrap: normal; word-spacing: normal; text-wrap-mode: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">qq monitors the window W" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: 0; font-size: 18.72px; font-size-adjust: none; overflow-wrap: normal; word-spacing: normal; text-wrap-mode: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">WW, returning the k" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: 0; font-size: 18.72px; font-size-adjust: none; overflow-wrap: normal; word-spacing: normal; text-wrap-mode: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">kk objects with the highest scores to the system with each slide of the window. This paper delves into one of its important variants, referred to as multiple continuous top. k" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: 0; font-size: 18.72px; font-size-adjust: none; overflow-wrap: normal; word-spacing: normal; text-wrap-mode: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">kk queries over data stream, which holds significant applications. While various efforts have been made to support continuous top-k query, few have addressed the complexities of multiple continuous top-k queries. The prevailing approach involves selecting a minimal number of objects in the window as candidates, incrementally maintaining them, and using them to support query processing as efficiently as possible. However, these endeavors exhibit sensitivity to the query workload scale or query parameters such as k" role= "presentation" style="box-sizing: border-box; display: inline-block; line-height: 0; font-size: 18.72px; font-size-adjust: none; overflow-wrap: normal; word-spacing: normal; text-wrap-mode: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">kk, the window length n" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: 0; font-size: 18.72px; font-size-adjust: none; overflow-wrap: normal; word-spacing: normal; text-wrap-mode: no wrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; margin: 0px; padding: 1px 0px; position: relative;">nn, and others. Consequently, they incur high running/space cost in updating the candidate set. In this paper, we propose a novel index PH-Tree (Partition and Heap-based Binary Tree), designed to facilitate multiple continuous top-k queries. We partition the query window into a group of disjoint partitions and use PH-Tree to organize these partitions. Additionally, the PH-Tree allows for flexible candidate selection based on the size of each partition, parameter distribution of queries and score distribution of objects. We further develop a group of efficient algorithms to support candidate set incremental maintenance and query processing. The effectiveness and efficiency of the proposed algorithms are validated through extensive theoretical analysis and experiments detailed in this paper.

Keywords

Sensitivity, Costs, Query processing, Binary trees, Data engineering, Partitioning algorithms, Maintenance

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, Netherlands, May 13-16: Proceedings

First Page

1575

Last Page

1588

ISBN

9798350317169

Identifier

10.1109/ICDE60146.2024.00129

Publisher

IEEE Computer Society

City or Country

Los Alamitos, CA

Citation

ZHU, Rui; JIA, Yujin; YANG, Xiaochun; ZHENG, Baihua; WANG, Bin; and ZONG, Chuanyu. Multiple continuous top-K queries over data stream. (2024). 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, Netherlands, May 13-16: Proceedings. 1575-1588.
Available at: https://ink.library.smu.edu.sg/sis_research/9284

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/ICDE60146.2024.00129

Download

Included in

Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Multiple continuous top-K queries over data stream

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Multiple continuous top-K queries over data stream

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links