Archives

Paul Ohm, What is Sensitive Information?

Paul Ohm, What is Sensitive Information?

Comment by: Peter Swire

PLSC 2013

Workshop draft abstract:

The diverse and dizzying variety of regulations, laws, standards, and corporate practices in place to deal with information privacy around the world share at their core at least one unifying concept: sensitive information. Some categories of information—health, financial, education, and child-related, to name only a few—are deemed different than others, and data custodians owe special duties and face many more constraints when it comes to the maintenance of these categories of information.

Sensitive information is a show stopper. Otherwise lax regulations become stringent when applied to sensitive information. Permissive laws set stricter rules for the sensitive. The label plays a prominent role in rhetoric and debate, as even the most ardent believer in free markets and unfettered trade in information will bow low to the ethical edict that they never sell sensitive information regardless of the cost.

Despite the importance and prevalence of sensitive information, very little legal scholarship has systematically studied this important category. Sensitive information is deeply undertheorized. What makes a type of information sensitive? Are the sensitive categories set in stone, or do they vary with time and technological advances? What are the political and rhetorical mechanisms that lead a type of information into or out of the designation? Why does the designation serve as such a powerful trump card? This Article seeks to answer these questions and more.

The Article begins by surveying the landscape of sensitive information. It identifies dozens of examples of special treatment for sensitive information in rules, laws, policy statements, academic writing, and corporate practices from a wide number of jurisdictions, in the United States and beyond.

Building on this survey, the Article reverse engineers the rules of decision that define sensitive information. From this, it develops a multi-factor test that may be applied to explain, ex post, the types of information that have been deemed sensitive in the past and also predict, ex ante, types of information that may be identified as sensitive soon. First, sensitive information can lead to significant forms of widely-recognized harm. Second, sensitive information is the kind that exposes the data subject to a high probability of such harm. By focusing in particular on these two factors, this Article sits alongside the work of many other privacy scholars who have in recent years shifted their focus to privacy harm, a long neglected topic. Third, sensitive information is often governed by norms of limited sharing. Fourth, sensitive information is rare and tends not to exist in many databases. Fifth, sensitive information tends to focus on harms that apply to the majority—often the ruling majority—of data subjects while information leading to harms affecting only a minority less readily secure the label.

To test the predictive worth of these factors, the Article applies them to assess whether two forms of data that have been hotly debated by information privacy experts in recent years are poised to join the ranks of the sensitive: geolocation data and remote biometric data. Neither one of these have already been widely accepted as sensitive in privacy law, yet both trigger many of the factors listed above. Of the two, geolocation data is further down the path, already recognized by laws, regulations, and company practices world wide. By identifying and justifying the treatment of geolocation and remote biometric data as sensitive, this Article hopes to spur privacy law reform in many jurisdictions.

Turning from the rules of decision used in the classification of sensitive information to the public choice mechanisms that lead particular types of information to be classified, the Article tries to explain why new forms of sensitive information often fail to be recognized until years after they satisfy most of the factors listed above. It argues that this stems from the way political institutions incorporate new learning from technology slowly and haphazardly. To improve this situation, the Article suggests new administrative mechanisms to identify new forms of sensitive information on a much more accelerated timeframe. It specifically proposes that in the United States, the FTC undertake a periodic—perhaps biennial or triennial—review of potential categories of sensitive information, suggested by members of the public. The FTC would be empowered to classify particular types of information as sensitive, or to remove the designation from types that are no longer sensitive, because of changes in technology or society. It would base these decisions on rigorous empirical review of the factors listed above, focusing in particular on the harms inherent in the data and the probability of harm, given likely threat models. It illustrates the idea by considering a type of information that has not really been considered sensitive, calendar information. Calendar information tends to reveal location, associations, and other forms of closely-held, confidential information, yet very few recognize the status of this potentially new class of sensitive information. We might consider asking the FTC whether this deserves to be categorized sensitive.

Finally, the Article tackles the vexing and underanalyzed problem of idiosyncratically sensitive information. Since traditional conceptions of sensitive information cover primarily majoritarian concerns, it does little to protect the data that feel sensitive only to smaller groups. This is a significant gap in the information privacy landscape, as every person cares about idiosyncratic forms of information that worry only a few. It may be that traditional forms of information privacy law are ill-equipped to deal with idiosyncratically sensitive information. Regulating idiosyncratically sensitive information will require more aggressive forms of regulation, for example premising new laws on the amount of information held, not only on the type of information held, on the theory that larger databases are likelier to hold idiosyncractically sensitive information than smaller databases.

Paul Ohm, Branding Privacy

Paul Ohm, Branding Privacy

Comment by: Deven Desai

PLSC 2012

Workshop draft abstract:

This Article focuses on the problem of what James Grimmelmann has called the “privacy lurch,”[1] which I define as an abrupt change made to the way a company handles data about individuals. Two prominent examples include Google’s decision in early 2012 to tear down the walls that once separated data about users collected from its different services and Facebook’s decisions in 2009 and 2010 to expose more user profile information to the public web by default than it had in the past. Privacy lurches disrupt long-settled user expectations and undermine claims that companies protect privacy by providing adequate notice and choice. They expose users to much more risk to their individual privacy than the users might have anticipated or desired, assuming they are paying attention at all. Given the special and significant problems associated with privacy lurches, this Article calls on regulators to seek creative solutions to address them.

But even though privacy lurches lead to significant risks of harm, some might argue we should do nothing to limit them. Privacy lurches are the product of a dynamic marketplace for online goods and services.  What I call a lurch, the media instead tends to mythologize as a “pivot,” a wel-come shift in a company’s business model, celebrated as an example of the nimble dynamism of entrepreneurs that has become a hallmark of our information economy.  Before we intervene to tamp down the harms of privacy lurches, we need to consider what we might give up in return.

Weighing the advantages of the dynamic marketplace against the harms of privacy lurches, this Article prescribes a new form of mandatory notice and choice. To breathe a little life into the usually denigrated options of notice and choice this Article looks to the scholarship of trademark law, representing a novel integration of two very important but until now almost never connected areas of information law.  This bridge has long been overdue to be built, as the theory of trademark law centers on the very same information quality and consumer protection concerns that animate notice and choice debates in privacy law. These theories describe the important informational power of trademarks (and service marks and, more generally, brands) to signal quality and goodwill to consumers concisely and efficiently.  Trademark scholars also describe how brands can serve to punish and warn, helping consumers recognize a company with a track record of shoddy practices or weak attention to consumer protection.

The central recommendation of this Article is that lawmakers and regulators should force almost every company that handles customer information to associate its brand name with a specified set of core privacy commitments.  The name, “Facebook,” for example, should be inextricably bound to that company’s specific, fundamental promises about the amount of information it collects and the uses to which it puts that information. If the company chooses someday to depart from these initial core privacy commitments, it must be required to use a new name with its modified service, albeit perhaps one associated with the old name, such as “Facebook Plus” or “Facebook Enhanced.”

Although this solution is novel, it is far from radical when one considers how well it is sup-ported by the theoretical underpinnings of both privacy law and trademark law. It builds on the work of privacy scholars who have looked to consumer protection law for guidance, representing another important intradisciplinary bridge, this one between privacy law and product safety law.  Just as companies selling inherently dangerous products are obligated to attach warning labels,  so too should companies shifting to inherently dangerous privacy practices be required to display warn-ing labels. And the spot at the top of every Internet web page listing the brand name is arguably the only space available for an effective online warning label. A “branded privacy” solution is also well-supported by trademark theory, which focuses on giving consumers the tools they need to accurately and efficiently associate trademarks with the consistent qualities of a service in ways that privacy lurches disregard.

At the same time, because this solution sets the conditions of privacy lurches rather than prohibiting them outright, and by restricting mandatory rebranding only to situations involving a narrow class of privacy promises, it leaves room for market actors to innovate, striking a proper balance between the positive aspects of dynamism and the negative harms of privacy lurches. Com-panies will be free to evolve and adapt their practices in any way that does not tread upon the set of core privacy commitments, but they can change a core commitment only by changing their brand. This rule will act like a brake, forcing companies to engage more in internal deliberation than they do today about the class of choices consumers care about most, without preventing dynamism when it is unrelated to those choices or when the value of dynamism is high. And when companies do choose to modify a core privacy commitment, its new brand will send a clear, unambiguous signal to consumers and privacy watchers that something important has changed, directly addressing the information quality problems that plague notice-and-choice regimes in ways that improve upon prior suggestions.


[1] James Grimmelmann, Saving Facebook, 94 Iowa L. Rev. 1137 (2009).

Paul Ohm, Big Data & Privacy

Paul Ohm, Big Data & Privacy

Comment by: Susan Freiwald

PLSC 2011

Workshop draft abstract:

We are witnessing a sea change in the way we threaten and protect information privacy. The rise of Big Data—meaning powerful new methods of data analytics directed at massive, highly interconnected databases of information—will exacerbate privacy problems and put particular pressure on privacy regulation. The laws, regulations, and enforcement mechanisms we have developed in the first century of information privacy law are fundamentally hampered by the special features of Big Data. Big Data will force us to rethink how we regulate privacy.

To do that, we first need to understand what has changed, by surveying Big Data and cataloging what is new. Big Data includes powerful techniques for reidentification, the focus of my last Article, but it encompasses much more. Two features of Big Data, in particular, interfere with the way we regulate privacy. First, Big Data produces results that defy human intuition and resist prediction. The paradigmatic output of Big Data is the surprising correlation. Second, the underlying mechanisms that make Big Data work are often inscrutable to human understanding. Big Data reveals patterns and correlations, not mental models. B is correlated with A, Big Data reveals, but it cannot tell us why, and given the counter-intuitiveness of the result, we are sometimes left unable even to guess.

Big Data’s surprising correlations and inscrutability will break the two predominant methods we use to regulate privacy today, what I call the “bad data list” approach and the Fair Information Practice Principles approach. Both approaches rely on transparency and predictability, two things that Big Data fundamentally resists. Neither regulatory method can survive Big Data, and we cannot salvage either using only small tweaks and extensions. We need to start over.

Christopher Soghoian, An End to Privacy Theater: Exposing and Discouraging Corporate Disclosure of User Data to the Government

Christopher Soghoian, An End to Privacy Theater: Exposing and Discouraging Corporate Disclosure of User Data to the Government

Comment by: Paul Ohm

PLSC 2010

Published version available here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1656494

Workshop draft abstract

Today, when consumers evaluate potential mobile phone carriers – they are likely to consider several differentiating factors: The available handsets, the cost of service, and the firm’s reputation for network quality and customer service. The carriers’ divergent approaches to privacy, and their policies regarding government access to customers’ private data, are not considered in the purchasing process – perhaps because it is practically impossible for consumers to discover this information when they are choosing their carrier.

The differences in the privacy practices of the major players in the telecommunications and Internet applications market are quite significant – some firms retain identifying data for years, while others retain no data at all. For a mobile phone user investigated by the government, this difference in logging practices can significantly impact their freedom.

A naïve reader might simply assume that the law gives companies very little wiggle room – when they are required to provide data, they must do so. However, this is not the case. Companies have a huge amount of flexibility in the way they design their networks, in the amount of data they retain by default, the exigent circumstances in which they share data without a court order, and the degree to which they fight unreasonable requests.

This article will outline the numerous ways in which telecommunications carriers and Internet services currently assist the government, providing easy access to their customers’ private communications and documents. Relying on several case studies, this article will analyze the specific product design decisions that firms can make that either protect their customers’ private data by default, or make it trivial for the government to engage in large scale surveillance. This article will also examine the flow of money between the government and carriers, who are statutorily permitted to demand reasonable compensation for their assistance, and will discuss the public policy advantages of surveillance as either a corporate profit center or a corporate tax.

Overall, this article will attempt to deliver some degree of transparency which is currently missing from the privacy market, and will outline a path to an eventual scenario in which consumers evaluate privacy approaches in advance, and firms can effectively compete for consumers on their willingness to disclose data to the government. Such a degree of transparency will permit the market to punish (or potentially reward) firms that put the governments’ needs first.

 

Paul Ohm, The Benefits of the Old Privacy: Restoring the Focus to Traditional Harm

Paul Ohm, The Benefits of the Old Privacy: Restoring the Focus to Traditional Harm

Comment by: Bruce Boyden

PLSC 2010

Workshop draft abstract:

The rise of the Internet stoked so many new privacy fears that it inspired a wave of legal scholars to give birth to a new specialty of legal scholarship, Information Privacy law. We should both recognize this young specialty’s great successes and wonder about its frustrating shortcomings. On the one hand, it has provided a rich structure of useful and intricate taxonomies with which to analyze new privacy problems and upon which to build sweeping prescriptions for law and policy.

But why has this important structural work had so little impact on concrete law and policy reform? Has any significant new or amended law depended heavily on this impressive body of scholarship? I submit that none has, which is particularly curious given the way privacy has dominated policy debates in recent years.

In this Article, I propose a theory for why the Information Privacy law agenda has failed to provoke meaningful reform. Building on Ann Bartow’s “dead bodies” thesis,  I argue that Information Privacy scholars gave up too soon on the prospect of relying on traditional privacy harms, the kind of harms embodied in the laws of harassment, blackmail, discrimination, and the traditional four privacy torts. Instead, these scholars have proposed broader theories of harm, arguing that we should worry about small incursions of privacy that aggregate across society, focusing on threats to individual autonomy, deliberative democracy, and human development, among many other values. As the symbol of these types of privacy harms, these scholars have pointed to Bentham’s and Foucault’s Panopticon.

Unfortunately, fear of the Panopticon is unlikely to drive contemporary law and policy for two reasons. First, as a matter of public choice, Panoptic fears are not the kind that spurs legislators to act. Lawmakers want to point to poster children suffering concrete, tangible harm—to Bartow’s dead bodies—before they will be motivated to act. The Panopticon provides none. Second, privacy is a relative, contingent, contextualized, and malleable value. It is weighed against other values, such as security and economic efficiency, so any theory of privacy must be presented in a commensurable way. But the Panopticon is an incommensurable fear. Even if you agree that it represents something dangerous that society must work to avoid, when you place this amorphous fear against any concrete, countervailing value, the concrete will always outweigh the vague.

I argue that we should shift our focus away from the Panopticon and back on traditional privacy harm. We should point to people who suffer tangible, measureable, harm; we should spotlight privacy’s dead bodies.

But this isn’t a call to return meekly back to the types of narrow concerns that gave rise to the traditional privacy torts. Theories of privacy harm should include not only the stories of people who already have been harmed but also rigorous predictions of new privacy harms that people will suffer because of changes to technology.

Ironically, information privacy law scholars who make these kinds of predictions will often propose prescriptions that are as broad and sweeping as some of those made by their Panopticon-driven counterparts. Traditional-harm theories of information privacy aren’t necessarily regressive forms of privacy scholarship, and this Article points to the work of a new wave of information privacy law scholars who are situated in traditional harm but at the same time offer aggressive new prescriptions. From my own work, I revisit the “database of ruin” theory, a prediction that leads to aggressive prescriptions for new privacy protections.

Finally, I argue why this predictive-traditional-harm approach is more likely to lead to political action than the Panoptic approach, recasting prescriptions from some of the classic recent works of Information Privacy into more politically saleable forms by translating them through the traditional harm lens.

Paul Ohm, The Thin Line Between Reasonable Network Management and Illegal Wiretapping

Paul Ohm, The Thin Line Between Reasonable Network Management and Illegal Wiretapping

Comment by: Paul Ohm

PLSC 2008

Workshop draft abstract:

AT&T made headlines when it publicly discussed aggressive plans to monitor subscriber communications on an unprecedented scale and for novel purposes.  Comcast has examined packets on its network, in order to identify and throttle Bittorrent users.  Charter Communications informed thousands of its customers that it would track the websites they visited in order to serve them targeted ads.  These may be precursors to a storm of unprecedented, invasive Internet Service Provider (ISP) monitoring of the Internet.

Many consumer advocates have characterized these techniques as violations of network neutrality—the principle that providers should treat all network traffic the same.  Trumpeting these examples, these advocates have urged Congress to mandate network neutrality.
Until now, nobody has recognized that we already enjoy mandatory network neutrality.  Two forces—one technological, one legal—deliver this mandate.  First, up until the recent past, the best network monitoring devices could not keep up with the fastest network connections; inferior monitoring tools have prevented providers from engaging in aggressive network traffic discrimination.  These technological limitations have forced an implicit network neutrality mandate.

Second, legislatures have passed expansive wiretapping laws.  Under these provisions, so-called network management techniques like those described above may be illegal.  By limiting network management, the wiretapping laws mandate a sort of network neutrality.  Historically, however, few Internet Service Providers (ISPs) have had to defend themselves against wiretapping charges, but as the implicit, technological network neutrality mandate fades and as ISPs respond by expanding their monitoring programs, the wiretapping laws will soon emerge as significant constraints on ISP activities.

Network neutrality has been debated for years and nearly to death, but the recognition that we already have mandatory network neutrality inverts the debate.  ISPs are unable to do some things with their networks, unless and until they can convince Congress and state legislatures to change the wiretapping laws.  More importantly, focusing on the wiretap laws freshens the debate, which has always been mostly about innovation, by injecting questions of privacy, surveillance, and freedom.

Paul Ohm, The Probability of Privacy

Paul Ohm, The Probability of Privacy

Comment by: Michael Frommkin

PLSC 2009

Workshop draft abstract:

Data collectors and aggregators defend themselves against claims that they are invading privacy by invoking a verb of relatively recent vintage—“to anonymize.” By anonymizing the data—by removing or replacing all of the names or other personal identifiers—they argue that they are negating the risk of any privacy harm. Thus, Google anonymizes data in its search query database after nine months; proxy email and web browsing services promise Internet anonymity; and network researchers trade sensitive data only after anonymizing them first.

Recently, two splashy news stories revealed that anonymization is not all it is cracked up to be. First, America Online released twenty million search queries from 650,000 users. Next, Netflix released a database containing 100 Million movie ratings from nearly 500,000 users. In both cases, the personal identifiers in the databases were anonymized, and in both cases, researchers were able to “deanonymize” or “reidentify” at least some of the people in the database.

Even before these results, Computer Scientists had begun to theorize deanonymization. According to this research, none of which has yet been rigorously imported into legal scholarship, the utility and anonymity of data are linked. The only way to anonymize a database perfectly is to strip all of the information from it; any database which is useful is also imperfectly anonymous; the more useful a database, the easier it is to reidentify the personal information in the database.

This Article takes a comprehensive look at both claims of anonymization and theories of reidentification, weaving them into law and policy. It compares online and data privacy with anonymization standards and practices in health policy, where these issues have been grappled with for decades.

The Article concludes that claims of anonymization should be viewed with great suspicion. Data is never “anonymized,” and it is better to speak of “the probability of privacy” of different practices. Finally, the Article surveys research into how to reduce the risk of reidentification, and it incorporates this research into a set of prescriptions for various data privacy laws.