Publication Type
Journal Article
Version
acceptedVersion
Publication Date
10-2020
Abstract
General-purpose topic models have widespread industrial applications. Yet high-quality topic modeling is becoming increasingly challenging because accurate models require large amounts of training data typically owned by multiple parties, who are often unwilling to share their sensitive data for collaborative training without guarantees on their data privacy. To enable effective privacy-preserving multiparty topic modeling, we propose a novel federated general-purpose topic model named private and consistent topic discovery (PC-TD). On the one hand, PC-TD seamlessly integrates differential privacy in topic modeling to provide privacy guarantees on sensitive data of different parties. On the other hand, PC-TD exploits multiple sources of semantic consistency information to retain the accuracy of topic modeling while protecting data privacy. We verify the effectiveness of PC-TD on real-life datasets. Experimental results demonstrate its superiority over the state-of-the-art general-purpose topic models.
Keywords
Topic discovery, topic models, private data
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Software and Cyber-Physical Systems
Publication
IEEE Intelligent Systems
Volume
36
Issue
5
First Page
96
Last Page
103
ISSN
1541-1672
Identifier
10.1109/MIS.2020.3033459
Publisher
Institute of Electrical and Electronics Engineers
Citation
1
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/MIS.2020.3033459
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons