Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

8-2017

Abstract

The general notion of a metric space encompasses a diverse range of data types and accompanying similarity measures. Hence, metric search plays an important role in a wide range of settings, including multimedia retrieval, data mining, and data integration. With the aim of accelerating metric search, a collection of pivot-based indexing techniques for metric data has been proposed, which reduces the number of potentially expensive similarity comparisons by exploiting the triangle inequality for pruning and validation. However, no comprehensive empirical study of those techniques exists. Existing studies each offers only a narrower coverage, and they use different pivot selection strategies that affect performance substantially and thus render cross-study comparisons difficult or impossible. We offer a survey of existing pivot-based indexing techniques, and report a comprehensive empirical comparison of their construction costs, update efficiency, storage sizes, and similarity search performance. As part of the study, we provide modifications for two existing indexing techniques to make them more competitive. The findings and insights obtained from the study reveal different strengths and weaknesses of different indexing techniques, and offer guidance on selecting an appropriate indexing technique for a given setting.

Keywords

Similarity search, tree, space, queries

Discipline

Databases and Information Systems | Data Storage Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of the VLDB Endowment: 43rd International conference, Munich Germany, 2017 August 28 -September 1

First Page

1058

Last Page

1069

Identifier

10.14778/3115404.3115411

Publisher

VLDB Endowment

City or Country

Stanford, CA

Copyright Owner and License

Authors

Additional URL

http://www.vldb.org/pvldb/vol10/p1058-gao.pdf

Share

COinS