Paul Ohm, Big Data & Privacy

Paul Ohm, Big Data & Privacy

Comment by: Susan Freiwald

PLSC 2011

Workshop draft abstract:

We are witnessing a sea change in the way we threaten and protect information privacy. The rise of Big Data—meaning powerful new methods of data analytics directed at massive, highly interconnected databases of information—will exacerbate privacy problems and put particular pressure on privacy regulation. The laws, regulations, and enforcement mechanisms we have developed in the first century of information privacy law are fundamentally hampered by the special features of Big Data. Big Data will force us to rethink how we regulate privacy.

To do that, we first need to understand what has changed, by surveying Big Data and cataloging what is new. Big Data includes powerful techniques for reidentification, the focus of my last Article, but it encompasses much more. Two features of Big Data, in particular, interfere with the way we regulate privacy. First, Big Data produces results that defy human intuition and resist prediction. The paradigmatic output of Big Data is the surprising correlation. Second, the underlying mechanisms that make Big Data work are often inscrutable to human understanding. Big Data reveals patterns and correlations, not mental models. B is correlated with A, Big Data reveals, but it cannot tell us why, and given the counter-intuitiveness of the result, we are sometimes left unable even to guess.

Big Data’s surprising correlations and inscrutability will break the two predominant methods we use to regulate privacy today, what I call the “bad data list” approach and the Fair Information Practice Principles approach. Both approaches rely on transparency and predictability, two things that Big Data fundamentally resists. Neither regulatory method can survive Big Data, and we cannot salvage either using only small tweaks and extensions. We need to start over.