Publication Type

Journal Article

Version

acceptedVersion

Publication Date

10-2020

Abstract

General-purpose topic models have widespread industrial applications. Yet high-quality topic modeling is becoming increasingly challenging because accurate models require large amounts of training data typically owned by multiple parties, who are often unwilling to share their sensitive data for collaborative training without guarantees on their data privacy. To enable effective privacy-preserving multiparty topic modeling, we propose a novel federated general-purpose topic model named private and consistent topic discovery (PC-TD). On the one hand, PC-TD seamlessly integrates differential privacy in topic modeling to provide privacy guarantees on sensitive data of different parties. On the other hand, PC-TD exploits multiple sources of semantic consistency information to retain the accuracy of topic modeling while protecting data privacy. We verify the effectiveness of PC-TD on real-life datasets. Experimental results demonstrate its superiority over the state-of-the-art general-purpose topic models.

Keywords

Topic discovery, topic models, private data

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Research Areas

Software and Cyber-Physical Systems

Publication

IEEE Intelligent Systems

Volume

36

Issue

5

First Page

96

Last Page

103

ISSN

1541-1672

Identifier

10.1109/MIS.2020.3033459

Publisher

Institute of Electrical and Electronics Engineers

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1109/MIS.2020.3033459

Share

COinS