User Data: The End of Anonymity, the Beginning of Privacy
Presenter
May 9, 2012
Keywords:
- Measures of information
MSC:
- 94A17
Abstract
"We do not collect personally identifiable information"... "This dataset
have been de-identified prior to release"... From advertisers tracking Web
clicks to biomedical researchers sharing clinical records, anonymization
is the main privacy protection mechanism used for sensitive user data
today.
I will argue that the distinction between "personally identifiable" and
"non-personally identifiable" information is fallacious by showing how to
infer private information from fully anonymized data in three settings:
(1) records of individual transactions and preferences, illustrated by the
Netflix Prize dataset, (2) social networks, and (3) recommender systems,
where temporal changes in aggregate statistics allow accurate inference
of hidden individual transactions.
I will then outline a program for data privacy research. It includes
several challenging problems in the design and implementation of
privacy-preserving systems, domain-specific algorithmic research,
as well as policy and economic issues.