Steven M. Bellovin, Renée M. Hutchins, Tony Jebara, Sebastian Zimmeck, When Enough is Enough: Location Tracking, Mosaic Theory and Machine Learning

Steven M. Bellovin, Renée M. Hutchins, Tony Jebara, Sebastian Zimmeck, When Enough is Enough: Location Tracking, Mosaic Theory and Machine Learning

Comment by: Orin Kerr

PLSC 2013

Published version available here:

Workshop draft abstract:

Since 1967, the Supreme Court has tied our right to be free of unwanted government scrutiny to the concept of reasonable expectations of privacy.5 Reasonable expectations include, among other things, an assessment of the intrusiveness of government action. When making such assessments historically, the Court has considered police conduct with clear temporal, geographic, or substantive limits. However, in an era where new technologies permit the storage and compilation of vast amounts of personal data, things are becoming more complicated. A school of thought known as “Mosaic Theory” has stepped into the void, ringing the alarm that our old tools for assessing the intrusiveness of government conduct potentially undervalue our privacy rights.

Mosaic theorists advocate a cumulative approach to the evaluation of data collection.

Under the theory, searches are “analyzed as a collective sequence of steps rather than as individual steps.”6 The approach is based on the recognition that comprehensive aggregation of even seemingly innocuous data reveals greater insight than consideration of each piece of information in isolation. Over time, discrete units of surveillance data can be processed to create a mosaic of habits, relationships, and much more. Consequently, a Fourth Amendment analysis that focuses only on the government’s collection of discrete units of trivial data fails to appreciate the true harm of long-term surveillance—the composite.

In the context of location tracking, the Court has previously suggested that the Fourth Amendment may (at some theoretical threshold) be concerned with the accumulated information revealed by surveillance.7 Similarly, in the Court’s recent decision in United States v. Jones, a majority of concurring justices indicated willingness to explore such an approach. However, in the main, the Court has rejected any notion that technological enhancement matters to the constitutional treatment of location tracking.8 Rather the Court has found that such surveillance in public spaces, which does not require physical trespass, is equivalent to a human tail and thus not regulated by the Fourth Amendment. In this way, the Court has avoided quantitative analysis of the amendment’s protections.

The Court’s reticence is built on the enticingly direct assertion that objectivity under the mosaic theory is impossible. This is true in large part because there has been no rationale yet offered to objectively distinguish relatively short-term monitoring from its counterpart of greater duration. As Justice Scalia writing for the majority in United States v. Jones, recently observed: “it remains unexplained why a 4-week investigation is ‘surely’ too long.”9 This article answers that question for the first time by combining the lessons of machine learning with mosaic theory and applying the pairing to the Fourth Amendment.

Machine learning is the branch of computer science concerning systems that can draw inferences from collections of data, generally by means of mathematical algorithms. In a recent competition called “The Nokia Mobile Data Challenge,”10 researchers evaluated machine learning’s applicability to GPS and mobile phone data. From a user’s location history alone, the researchers were able to estimate the user’s gender, marital status, occupation and age.11

Algorithms developed for the competition were also able to predict a user’s likely future position by observing past location history. Indeed, a user’s future location could even be inferred with a relative degree of accuracy using the location data of friends and social contacts.12

Machine learning of the sort on display during the Nokia Challenge seeks to harness with artificial intelligence the data deluge of today’s information society by efficiently organizing data, finding statistical regularities and other patterns in it, and making predictions therefrom. It deduces information—including information that has no obvious linkage to the input data—that may otherwise have remained private due to the natural limitations of manual and human-driven investigation. Analysts have also begun to “train” machine learning programs using one dataset to find similar characteristics in new datasets. When applied to the digital “bread crumbs” of data generated by people, machine learning algorithms can make targeted personal predictions. The greater the number of data points evaluated the greater the accuracy of the algorithm’s results.

As this article explains, technology giveth and technology taketh away. The objective understanding of data compilation that is revealed by machine learning provides important Fourth Amendment insights. We should begin to consider these insights more closely.

In four parts, this article advances the conclusion that the duration of investigations is relevant to their substantive Fourth Amendment treatment because duration affects the accuracy of the generated composite. Though it was previously difficult to explain why an investigation of four weeks was substantively different from an investigation of four hours, we now can. As machine learning algorithms reveal, composites (and predictions) of startling accuracy can be generated with remarkably few data points. Furthermore, in some situations accuracy can increase dramatically above certain thresholds. For example, a 2012 study found the ability to deduce ethnicity improved slowly through five weeks of phone data monitoring, jumped sharply to a new plateau at that point, and then increased sharply again after twenty-eight weeks. More remarkably, the accuracy of identification of a target’s significant other improved dramatically after five days’ worth of data inputs.14 Experiments like these support the notion of a threshold, a point at which it makes sense to draw a line.

The results of machine learning algorithms can be combined with quantitative privacy definitions. For example, when viewed through the lens of k-anonymity, we now have an objective basis for distinguishing between law enforcement activities of differing duration. While reasonable minds may dispute the appropriate value of k or may differ regarding the most suitable minimum accuracy threshold, this article makes the case that the collection of data points allowing composites or predictions that exceed selected thresholds should be deemed unreasonable searches in the absence of a warrant.15 Moreover, any new rules should take into account not only the data being collected but also the foreseeable improvement in the machine learning technology that will ultimately be brought to bear on it; this includes using future algorithms on older data.

In 2001, the Supreme Court asked “what limits there are upon the power of technology to shrink the realm of guaranteed privacy.”16 In this piece, we explore what lessons there are in the power of technology to protect the realm of guaranteed privacy. The time has come for the Fourth Amendment to embrace what technology already tells us—a four-week investigation is surely too long because the amount of data collected during such an investigation creates a highly intrusive view of person that, without a warrant, fails to comport with our constitutional limits on government.


1  Professor, Columbia University, Department of Computer Science.

2  Associate Professor, University of Maryland Carey School of Law.

3  Associate Professor, Columbia University, Department of Computer Science.

4 Ph.D. candidate, Columbia University, Department of Computer Science.

5  Katz v. United States, 389 U.S. 347, 361 (1967) (Harlan, J., concurring).

6  Orin Kerr, The Mosaic Theory of the Fourth Amendment, 111 Mich. L. Rev. 311, 312 (2012).

7 United States v. Knotts, 460 U.S. 276, 284 (1983).

8  Compare Knotts, 460 U.S. at 276 (rejecting the contention that an electronic beeper should be treated differently than a human tail) and Smith v. Maryland, 442 U.S. 735, 744 (1979) (approving the warrantless use of a pen register in part because the justices were “not inclined to hold that a different constitutional result is required because the telephone company has decided to automate.”) with Kyllo v. United States, 533 U.S. 27, 33 (2001) (recognizing that advances in technology affect the degree of privacy secured by the Fourth Amendment).

9  United States v. Jones, 132 S.Ct. 945 (2012); see also Kerr, 111 Mich. L. Rev. at 329-330.

10  See

11  Demographic Attributes Prediction on the Real-World Mobile Data, Sanja Brdar, Dubravko Culibrk, and Vladimir

Crnojevic, Nokia Mobile Data Challenge Workshop 2012.

12  Interdependence and Predictability of Human Mobility and Social Interactions, Manlio de Domenico, Antonio

Lima, and Mirco Musolesi, Nokia Mobile Data Challenge Workshop 2012.

14  See, e.g., Yaniv Altshuler, Nadav Aharony, Michael Fire, Yuval Elovici, Alex Pentland, Incremental Learning with Accuracy Prediction of Social and Individual Properties from Mobile-Phone Data, WS3P, IEEE Social Computing (2012), especially Figures 9 and 10.

15 Admittedly, there are differing views on sources of authority beyond the Constitution that might justify location tracking. See, e.g., Stephanie K. Pell and Christopher Soghoian, Can You See Me Now? Toward Reasonable Standards for Law Enforcement Access to Location Data That Congress Could Enact, 27 Berkeley Tech. L.J. 117 (2012).

16  Kyllo, 533 U.S. at 34.

Peter Swire, Backdoors

Peter Swire, Backdoors

Comment by: Orin Kerr

PLSC 2012

Workshop draft abstract:

This article, which hopefully will be the core of a forthcoming book, uses the idea of “backdoors” to unify previously disparate privacy and security issues in a networked and globalized world.  Backdoors can provide government law enforcement and national security agencies with lawful (or unlawful) access to communications and data.  The same, or other, backdoors, can also provide private actors, including criminals, with access to communications and data.

Four areas illustrate the importance of the law, policy, and technology of backdoors:

(1) Encryption.  As discussed in my recent article on “Encryption and Globalization,” countries including India and China are seeking to regulate encryption in ways that would give governments access to encrypted communications.  An example is the Chinese insistence that hardware and software built there use non-standard cryptosystems developed in China, rather than globally-tested systems.  These types of limits on encryption, where implemented, give governments a pipeline, or backdoor, into the stream of communications.

(2) CALEA.  Since 1994, the U.S. statute CALEA has required telephone networks to make communications “wiretap ready.”  CALEA requires holes, or backdoors, in communications security in order to assure that the FBI and other agencies have a way into communications flowing through the network.  The FBI is now seeking to expand CALEA-style requirements to a wide range of Internet communications that are not covered by the 1994 statute.

(3) Cloud computing.  We are in the midst of a massive transition to storage in the cloud of companies’ and individuals’ data.  Cloud providers promise strong security for the stored data. However, government agencies increasingly are seeking to build automated ways to gain access to the data, potentially creating backdoors for large and sensitive databases.

(4) Telecommunications equipment.  A newly important issue for defense and other government agencies is the “secure supply chain.”  The concern here arises from reports that major manufacturers, including the Chinese company Huawei, are building equipment that has the capability to “phone home” about data that moves through the network.  The Huawei facts (assuming they are true) illustrate the possibility that backdoors can be created systematically by non-government actors on a large scale in the global communications system.

These four areas show key similarities with the more familiar software setting for the term “backdoor” – a programmer who has access to a system, but leaves a way for the programmer to re-enter the system after manufacturing is complete.  White-hat and black-hackers have often exploited backdoors to gain access to supposedly secure communications and data.  Lacking to date has been any general theory, or comparative discussion, about the desirability of backdoors across these settings.  There are of course strongly-supported arguments for government agencies to have lawful access to data in appropriate settings, and these arguments gained great political support in the wake of September 11.  The arguments for cybersecurity and privacy, on the other hand, counsel strongly against pervasive backdoors throughout our computing systems.

Government agencies, in the U.S. and globally, have pushed for more backdoors in multiple settings, for encryption, CALEA, and the cloud.  There has been little or no discussion to date, however, about what overall system of backdoors should exist to meet government goals while also maintaining security and privacy.  The unifying theme of backdoors will highlight the architectural and legal decisions that we face in our pervasively networked and globalized computing world.

Orin Kerr, A Substitution-Effects Theory of the Fourth Amendment

Orin Kerr, A Substitution-Effects Theory of the Fourth Amendment

Comment by: Deven Desai

PLSC 2011

Workshop draft abstract:

Fourth Amendment law is often considered a theoretical embarrassment. The law consists of dozens of rules for very specific situations that seem to lack a coherent explanation. Constitutional protection varies dramatically based on seemingly arcane distinctions.

This Article introduces a new theory that explains and justifies both the structure and content of Fourth Amendment rules: The theory of equilibrium-adjustment. The theory of equilibrium-adjustment posits that the Supreme Court adjusts the scope of protection in response to new facts in order to restore the status quo level of protection.  When changing technology or social practice expands government power, the Supreme Court tightens Fourth Amendment protection; when it threatens government power, the Supreme Court loosens constitutional protection.  Existing Fourth Amendment law therefore reflects many decades of equilibrium-adjustment as facts change.  This simple argument explains a wide range of puzzling Fourth Amendment doctrines including the automobile exception; rules on using sense-enhancing devices; the decline of the “mere evidence” rule; how the Fourth Amendment applies to the telephone network; undercover investigations; the law of aerial surveillance; rules for subpoenas; and the special Fourth Amendment protection for the home.

The Article then offers a normative defense of equilibrium-adjustment. Equilibrium-adjustment maintains interpretive fidelity while permitting Fourth Amendment law to respond to changing facts.  Its wide appeal and focus on deviations from the status quo facilitates coherent decisionmaking amidst empirical uncertainty and yet also gives Fourth Amendment law significant stability.  The Article concludes by arguing that judicial delay is an important precondition to successful equilibrium-adjustment.

Peter Winn, Katz and the Origins of the “Reasonable Expectation of Privacy” Test

Peter Winn, Katz and the Origins of the “Reasonable Expectation of Privacy” Test

Comment by: Orin Kerr

PLSC 2009

Published version available here:

Workshop draft abstract:

The “reasonable expectation of privacy” test, formulated in the 1967 case of Katz v. United States,  represents a great touchstone in the law of privacy.  Katz is important not only because the test is used to determine when a governmental intrusion constitutes a “search” under the Fourth Amendment; but because the test has also found its way into state common law, statutes and even the laws of other nations.

This article addresses the historical background of the framing of that decision, argues that the credit for the development of the famous test belongs to counsel for Charles Katz, Harvey (now Judge) Schneider, who presented the test for the first time in his oral argument, not in the briefs.  The majority opinion’s  failure to mention the test is explained by the fact that the law clerk responsible for drafting Justice Stewart’s majority opinion missed the oral argument.  The test, of course, was articulated in Justice Harlan’s short concurring opinion – establishing him as not only a great jurist, but someone who knew how to listen.  Finally, the article argues that the famous test was intended by Justice Harlan to represent more of an evolutionary modification of the previous trespass standard, not a revolutionary new approach to the law – in fact, exactly how subsequent courts understood and applied the standard.