Paul Ohm, What is Sensitive Information?
Comment by: Peter Swire
Workshop draft abstract:
The diverse and dizzying variety of regulations, laws, standards, and corporate practices in place to deal with information privacy around the world share at their core at least one unifying concept: sensitive information. Some categories of information—health, financial, education, and child-related, to name only a few—are deemed different than others, and data custodians owe special duties and face many more constraints when it comes to the maintenance of these categories of information.
Sensitive information is a show stopper. Otherwise lax regulations become stringent when applied to sensitive information. Permissive laws set stricter rules for the sensitive. The label plays a prominent role in rhetoric and debate, as even the most ardent believer in free markets and unfettered trade in information will bow low to the ethical edict that they never sell sensitive information regardless of the cost.
Despite the importance and prevalence of sensitive information, very little legal scholarship has systematically studied this important category. Sensitive information is deeply undertheorized. What makes a type of information sensitive? Are the sensitive categories set in stone, or do they vary with time and technological advances? What are the political and rhetorical mechanisms that lead a type of information into or out of the designation? Why does the designation serve as such a powerful trump card? This Article seeks to answer these questions and more.
The Article begins by surveying the landscape of sensitive information. It identifies dozens of examples of special treatment for sensitive information in rules, laws, policy statements, academic writing, and corporate practices from a wide number of jurisdictions, in the United States and beyond.
Building on this survey, the Article reverse engineers the rules of decision that define sensitive information. From this, it develops a multi-factor test that may be applied to explain, ex post, the types of information that have been deemed sensitive in the past and also predict, ex ante, types of information that may be identified as sensitive soon. First, sensitive information can lead to significant forms of widely-recognized harm. Second, sensitive information is the kind that exposes the data subject to a high probability of such harm. By focusing in particular on these two factors, this Article sits alongside the work of many other privacy scholars who have in recent years shifted their focus to privacy harm, a long neglected topic. Third, sensitive information is often governed by norms of limited sharing. Fourth, sensitive information is rare and tends not to exist in many databases. Fifth, sensitive information tends to focus on harms that apply to the majority—often the ruling majority—of data subjects while information leading to harms affecting only a minority less readily secure the label.
To test the predictive worth of these factors, the Article applies them to assess whether two forms of data that have been hotly debated by information privacy experts in recent years are poised to join the ranks of the sensitive: geolocation data and remote biometric data. Neither one of these have already been widely accepted as sensitive in privacy law, yet both trigger many of the factors listed above. Of the two, geolocation data is further down the path, already recognized by laws, regulations, and company practices world wide. By identifying and justifying the treatment of geolocation and remote biometric data as sensitive, this Article hopes to spur privacy law reform in many jurisdictions.
Turning from the rules of decision used in the classification of sensitive information to the public choice mechanisms that lead particular types of information to be classified, the Article tries to explain why new forms of sensitive information often fail to be recognized until years after they satisfy most of the factors listed above. It argues that this stems from the way political institutions incorporate new learning from technology slowly and haphazardly. To improve this situation, the Article suggests new administrative mechanisms to identify new forms of sensitive information on a much more accelerated timeframe. It specifically proposes that in the United States, the FTC undertake a periodic—perhaps biennial or triennial—review of potential categories of sensitive information, suggested by members of the public. The FTC would be empowered to classify particular types of information as sensitive, or to remove the designation from types that are no longer sensitive, because of changes in technology or society. It would base these decisions on rigorous empirical review of the factors listed above, focusing in particular on the harms inherent in the data and the probability of harm, given likely threat models. It illustrates the idea by considering a type of information that has not really been considered sensitive, calendar information. Calendar information tends to reveal location, associations, and other forms of closely-held, confidential information, yet very few recognize the status of this potentially new class of sensitive information. We might consider asking the FTC whether this deserves to be categorized sensitive.
Finally, the Article tackles the vexing and underanalyzed problem of idiosyncratically sensitive information. Since traditional conceptions of sensitive information cover primarily majoritarian concerns, it does little to protect the data that feel sensitive only to smaller groups. This is a significant gap in the information privacy landscape, as every person cares about idiosyncratic forms of information that worry only a few. It may be that traditional forms of information privacy law are ill-equipped to deal with idiosyncratically sensitive information. Regulating idiosyncratically sensitive information will require more aggressive forms of regulation, for example premising new laws on the amount of information held, not only on the type of information held, on the theory that larger databases are likelier to hold idiosyncractically sensitive information than smaller databases.