Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2024

Abstract

��-closest pair (KCP for short) search is a fundamental problem in database research. Given a set of��-dimensional streaming data S, KCP search aims to retrieve �� pairs with the shortest distances between them. While existing works have studied continuous 1-closest pair query (i.e., �� = 1) over dynamic data environments, which allow for object insertions/deletions, they require high computational costs and cannot easily support KCP search with �� > 1. This paper investigates the problem of KCP search over data stream, aiming to incrementally maintain as few pairs as possible to support KCP search with arbitrarily ��. To achieve this, we introduce the concept of NNS (short for Nearest Neighbour pair-Set), which consists of all the nearest neighbour pairs and allows us to support KCP search via only accessing O (��) objects. We further observe that in most cases, we only need to use a small portion of NNS to answer KCP search as typically �� ≪ ��. Based on this observation, we propose TNNS (short for Threshold-based NN pair Set), which contains a small number of high-quality NN pairs, and a partition named ��-DLBP (short for ��-Distance Lower-Bound based Partition) to organize objects, with �� being an integer significantly smaller than ��. ��-DLBP organizes objects using up to O (log �� �� ) partitions and is able to support the construction and update of TNNS efficiently.

Keywords

Streaming Data, ��-Closest Pair Search, Partition, Cube

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Areas of Excellence

Digital transformation

Publication

Proceedings of the ACM on Management of Data, Santiago, Chile, June 9-15

Volume

1

First Page

1

Last Page

26

Identifier

10.1145/3617326

Publisher

ACM

City or Country

New York

Copyright Owner and License

Authors

Additional URL

https://dl.acm.org/doi/10.1145/3617326

Share

COinS