Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
8-2023
Abstract
Entity Resolution (ER) is a fundamental problem in data preparation. Standard deep ER methods have achieved state-of-the-art efectiveness, assuming that relations from diferent organizations are centrally stored. However, due to privacy concerns, it can be difcult to centralize data in practice, rendering standard deep ER solutions inapplicable. Despite eforts to develop rule-based privacy-preserving ER methods, they often neglect subtle matching mechanisms and have poor efectiveness as a result. To bridge efectiveness and privacy, in this paper, we propose CampER, an efective framework for privacy-aware deep entity resolution. Specifcally, we frst design a training pair self-generation strategy to overcome the absence of manually labeled data in privacy-aware scenarios. Based on the selfconstructed training pairs, we present a collaborative fne-tuning approach to learn the match-aware and uni-space individual tuple embeddings for accurate matching decisions. During the matching decision-making process, we frst introduce a cryptographically secure approach to determine matches. Furthermore, we propose an order-preserving perturbation strategy to signifcantly accelerate the matching computation while guaranteeing the consistency of ER results. Extensive experiments on eight widely-used benchmark datasets demonstrate that CampER not only is comparable with the state-of-the-art standard deep ER solutions in efectiveness, but also preserves privacy.
Keywords
entity resolution, representation learning, similarity measurement
Discipline
Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, August 6-10
First Page
626
Last Page
637
ISBN
9798400701030
Identifier
10.1145/3580305.3599266
Publisher
ACM
City or Country
New York
Citation
GUO, Yuxiang; CHEN, Lu; ZHOU, Zhengjie; ZHENG, Baihua; FANG, Ziquan; ZHANG, Zhikun; MAO, Yuren; and GAO, Yunjun.
CampER: An effective framework for privacy-aware deep entity resolution. (2023). KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, August 6-10. 626-637.
Available at: https://ink.library.smu.edu.sg/sis_research/8106
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3580305.3599266