Publication Type

Conference Proceeding Article

Version

submittedVersion

Publication Date

4-2012

Abstract

There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-dataabout individuals, e.g., their preferences, movie ratings, or transactiondata. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm thatwas used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of thedatabase and the auxiliary information available to the adversary thatenable two classes of privacy attacks. In the first attack, the adversarysuccessfully identifies the individual about whom she possesses auxiliaryinformation (an isolation attack). In the second attack, the adversarylearns additional information about the individual, although she may notbe able to uniquely identify him (an information amplification attack ).We demonstrate the applicability of the analytical results by empiricallyverifying that the mathematical properties assumed of the database areactually true for a significant fraction of the records in the Netflix movieratings database, which contains ratings from about 500,000 users.

Keywords

Privacy, database, de-anonymization

Discipline

Artificial Intelligence and Robotics | Computer Engineering

Research Areas

Data Science and Engineering

Publication

Proceedings of the First international conference on Principles of Security and Trust, Tallinn, Estonia, 2012 March 24-April 1

ISBN

9783642286407

Identifier

10.1007/978-3-642-28641-4_13

City or Country

Tallinn, Estonia

Additional URL

https://doi.org/10.1007/978-3-642-28641-4_13

Share

COinS