Publication Type
Conference Proceeding Article
Version
submittedVersion
Publication Date
4-2012
Abstract
There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-dataabout individuals, e.g., their preferences, movie ratings, or transactiondata. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm thatwas used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of thedatabase and the auxiliary information available to the adversary thatenable two classes of privacy attacks. In the first attack, the adversarysuccessfully identifies the individual about whom she possesses auxiliaryinformation (an isolation attack). In the second attack, the adversarylearns additional information about the individual, although she may notbe able to uniquely identify him (an information amplification attack ).We demonstrate the applicability of the analytical results by empiricallyverifying that the mathematical properties assumed of the database areactually true for a significant fraction of the records in the Netflix movieratings database, which contains ratings from about 500,000 users.
Keywords
Privacy, database, de-anonymization
Discipline
Artificial Intelligence and Robotics | Computer Engineering
Research Areas
Data Science and Engineering
Publication
Proceedings of the First international conference on Principles of Security and Trust, Tallinn, Estonia, 2012 March 24-April 1
ISBN
9783642286407
Identifier
10.1007/978-3-642-28641-4_13
City or Country
Tallinn, Estonia
Citation
DATTA, Anupam; SHARMA, Divya; and SINHA, Arunesh.
Provable de-anonymization of large datasets with sparse dimensions. (2012). Proceedings of the First international conference on Principles of Security and Trust, Tallinn, Estonia, 2012 March 24-April 1.
Available at: https://ink.library.smu.edu.sg/sis_research/4471
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-642-28641-4_13