Felix Wu, Privacy and Utility in Data Sets

Comment by: Jane Yakowitz

PLSC 2011

Published version available here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2031808

Workshop draft abstract:

Privacy and utility are inherently in tension with one another. Information is useful exactly when it allows someone to have knowledge that he would not otherwise have, and to make inferences that he would not otherwise be able to make. The goal of information privacy is precisely to prevent others from acquiring particular information or from being able to make particular inferences. Moreover, as others have demonstrated recently, we cannot divide the world into “personal” information to be withheld, and “non-personal” information to be disclosed. There is potential social value to be gained from disclosing even “personal” information. And the revelation of even “non-personal” information might provide the final link in a chain of inferences that leads to information we would like to withhold.

Thus, the disclosure of data involves an inherent tradeoff between privacy and utility. More disclosure is both more useful and less private. Less disclosure is both less useful and more private. This does not mean, however, that the disclosure of any one piece of information is no different from the disclosure of any other. Some disclosures may be relatively more privacy invading and less socially useful, or vice versa. The question is how to identify the privacy and utility characteristics of data, so as to maximize the utility of the data disclosed, and minimize privacy loss.

Thus far, at least two different academic communities have studied the question of analyzing privacy and utility. In the legal community, this question has come to the fore with recent work on the re-identification of individuals in supposedly anonymized data sets, as well as with questions raised by the behavioral advertising industry’s collection and analysis of consumer data. In the computer science community, this question has been studied in the context of formal models of privacy, particularly that of “differential privacy.” This paper seeks to bridge the two communities, to help policy makers understand the implications of the results obtained by formal modeling, and to suggest to computer scientists additional formal approaches that might capture more of the features of the policy questions currently being debated. We can and should bring to bear both the qualitative analysis of the law and the quantitative analysis of computer science to this increasingly salient question of privacy-utility tradeoffs.