Publication Type
Journal Article
Version
acceptedVersion
Publication Date
10-2023
Abstract
A fundamental problem in many scenarios is to match entities across two data sources. It is frequently presumed in prior work that entities to be matched are of comparable granularity. In this work, we address one-to-many or poly-matching in the scenario where entities have varying granularity. A distinctive feature of our problem is its bidirectional nature, where the 'one' or the 'many' could come from either source arbitrarily. Moreover, to deal with diverse entity representations that give rise to noisy similarity values, we incorporate novel notions of receptivity and reclusivity into a robust matching objective. As the optimal solution to the resulting formulation is proven computationally intractable, we propose more scalable yet still performant heuristics. Experiments on multiple real-life datasets showcase the effectiveness and outperformance of our proposed algorithms over baselines.
Keywords
Lenses, Cameras, Noise measurement, Matched filters, Soft sensors, Mathematical models, Linear programming, Entity resolution, matching, one-to-many
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
IEEE Transactions on Knowledge and Data Engineering
Volume
35
Issue
10
First Page
10762
Last Page
10774
ISSN
1041-4347
Identifier
10.1109/TKDE.2023.3266480
Publisher
Institute of Electrical and Electronics Engineers
Citation
LEE, Ween Jiann; TKACHENKO, Maksim; and LAUW, Hady Wirawan.
Robust Bidirectional Poly-Matching. (2023). IEEE Transactions on Knowledge and Data Engineering. 35, (10), 10762-10774.
Available at: https://ink.library.smu.edu.sg/sis_research/8277
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TKDE.2023.3266480
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons