Deven Desai, Data Hoarding: Privacy in the Age of Artificial Intelligence
Comment by: Kirsten Martin
Work draft abstract:
We live in an age of data hoarding. Those who have data never wish to release it. Those who don’t have data want to grab it and increase their stores. In both cases—refusing to release data and gathering data—the mosaic theory, which accepts that “seemingly insignificant information may become significant when combined with other information,”1 seems to explain the result. Discussions of mosaic theory focus on executive power. In national security cases the government refuses to share data lest it reveal secrets. Yet recent Fourth Amendment cases limit the state’s ability to gather location data, because under the mosaic theory the aggregate the data could reveal more than what isolated surveillance would reveal.2 The theory describes a problem but yields wildly different results. Worse it does not explain what to do about data collection, retention, and release in different contexts. Furthermore, if data hoarding is a problem for the state, it is one for the private sector too. Private companies, such as Amazon, Google, Facebook, and Wal-Mart, gather and keep as much data as possible, because they wish to learn more about consumers and how to sell to them. Researchers gather and mine data to open new doors in almost every scientific discipline. Like the government, neither group is likely to share the data they collect or increase transparency for in data is power.
I argue that just as we have started to look at the implications of mosaic theory for the state, we must do so for the private sector. So far, privacy scholarship has separated government and private sector data practices. That division is less tenable today. Not only governments, but also companies and scientists assemble digital dossiers. The digital dossiers of just ten years ago emerge faster than ever and with deeper information about us. Individualized data sets matter, but they are now part of something bigger. Large, networked data sets—so-called Big Data—and data mining techniques simultaneously allow someone to study large groups, to know what an individual has done in the past, and to predict certain future outcomes.3 In all sectors, the vast wave of automatically gathered data points is no longer a barrier to such analysis. Instead, it fuels and improves the analysis, because new systems learn from data sets. Thanks to artificial intelligence, the fantasy of a few data points connecting to and revealing a larger picture may be a reality.
Put differently, discussions about privacy and technology in all contexts miss a simple, yet fundamental, point: artificial intelligence changes everything about privacy. Given that large data sets are here to stay and artificial intelligence techniques promise to revolutionize what we learn from those data sets, the law must understand the rules for these new avenues of information. To address this challenge, I draw on computer science literature to test claims about the harms or benefits of data collection and use. By showing the parallels between state and private sector claims about data and mapping the boundaries of those claims, this Article offers a way to understand and manage what is at stake in the age of pervasive data hoarding and automatic analysis possible with artificial intelligence.
1 Jameel Jaffer, The Mosaic Theory, 77 SOCIAL RESEARCH 873, 873 (2010)
2 See e.g., Orin Kerr, The Mosaic Theory of the Fourth Amendment, 110 MICH. L. REV. __ (2012)
(forthcoming) (criticizing application of mosaic theory to analysis of when collective surveillance steps
constitute a search)
3 See e.g., Hyunyoung Choi and Hal Varian, Predicting the Present with Google Trends, Google, Inc. (April,
2009) available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1659302