Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2021

Abstract

Entity matching across two data sources is a prevalent need in many domains, including e-commerce. Of interest is the scenario where entities have varying granularity, e.g., a coarse product category may match multiple finer categories. Previous work in one-to-many matching generally presumes the `one' necessarily comes from a designated source and the `many' from the other source. In contrast, we propose a novel formulation that allows concurrent one-to-many bidirectional matching in any direction. Beyond flexibility, we also seek matching that is more robust to noisy similarity values arising from diverse entity descriptions, by introducing receptivity and reclusivity notions. In addition to an optimal formulation, we also propose an efficient and performant heuristic. Experiments on multiple real-life datasets from e-commerce sources showcase the effectiveness and outperformance of our proposed algorithms over baselines.

Keywords

entity resolution, matching, one-to-many, poly, bipoly

Discipline

Databases and Information Systems | Data Science

Research Areas

Data Science and Engineering

Publication

2021 IEEE International Conference on Data Mining ICDM: Auckland, Virtual, December 7-10: Proceedings

First Page

1192

Last Page

1197

ISBN

9781665423984

Identifier

10.1109/ICDM51629.2021.00143

Publisher

IEEE

City or Country

Piscataway, NJ

Embargo Period

12-13-2021

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1109/ICDM51629.2021.00143

Share

COinS