Eloïse Gratton, Interpreting “personal information” in light of its underlying risk of harm

Eloïse Gratton, Interpreting “personal information” in light of its underlying risk of harm

Comment by: Mark MacCarthy

PLSC 2013

Workshop draft abstract:

“Personal data are and will remain a valuable asset, but what counts as personal data? If one wants to protect X, one needs to know what X is.”(van den Hoven, 2008)

In the late sixties and early seventies, with the development of automated data banks and the growing use of computers in the private and public sector, privacy was conceptualized as having individuals “in control over their personal information” (Westin, 1967). The principles of Fair Information Practices were elaborated during this period and have been incorporated in data protection laws (“DPLs”) adopted in various jurisdictions around the world ever since.

Personal information is defined similarly in various DPLs (such as in Europe and Canada) as “information relating to an identifiable individual”. In the U.S., information is accorded special recognition through a series of sectoral privacy statutes focused on protecting “personally identifiable information” (or PII), a notion close to personal information. This definition of personal information (or very similar definitions) have been included in transnational policy instruments including the OECD Guidelines. Going back in time, we can note that identical or at least similar definitions of personal information were in fact used in the resolutions leading to the elaboration of the Convention 108 dating back to the early seventies (Council of Europe, Resolutions (73) 2 and (74) 29). This illustrates that a similar definition of personal information was already elaborated at that time, and has not been modified since.

In recent days, with the Internet and the circulation of new types of information, the efficiency of this definition may be challenged. Individuals constantly give off personal information and recent technological developments are triggering the emergence of new identification tools allowing for easier identification of individuals. Data-mining techniques and capabilities are reaching new levels of sophistication. Because it is now possible to interpret almost any data as personal information (any data can in one way or another be related to some individual) the question arises as to how much data should be considered as personal information. I maintain that when using a literal interpretation of the definition of personal information, many negative outcomes may occur. First, DPLs may be protecting all personal information, regardless of whether the information is worthy of protection, encouraging a potentially over-inclusive and burdensome framework. This definition may also prove to be under-inclusive as it may not govern certain profiles (falling outside of the scope of the definition), even if these profiles, although they may not “identify” an individual by name, may still be used against the individuals behind them. A literal interpretation of this definition may also create various uncertainties, especially in light of new types of data and collection tools which may identify a device or an object which may be used by one or more individuals.

In light of these issues, various authors have recently been proposing potential guidance, mostly on the issue of what “identifiability” actually means. For example, the work of Bercic and George (2009) is examining how knowledge of relational database design principles can greatly help to understand what is and what is not personal data. Lundevall-Unger and Tranvik (2011) propose a different and practical method for deciding the legal status of IP addresses (with regard to the concept of personal data) which consist of a “likely reasonable” test, resolved by assessing the costs (in terms of time, money, expertise, etc.) associated with employing legal methods of identification. Schwartz and Solove (2011) also argue that the current approaches to PII are flawed and propose a new approach called “PII 2.0,” which accounts for PII’s malleability. Based upon a standard rather than a rule, PII 2.0 would be based upon a continuum of “risk of identification” and would regulate information that relates to either an “identified” or “identifiable” individual (making a distinction between the two categories), and establishing different requirements for each category.

My contribution in providing guidance on this notion of “identifiability” has to do with using a new method for interpreting the notion of personal information, taking into account the ultimate purpose behind the adoption of DPLs, in order to ensure that only data that were meant to be covered by DPLs will in fact be covered. In the context of proposing such interpretation, the idea is to aim for a level of generality which corresponds with the highest level goal that the lawmakers wished to achieve (Bennet Moses, 2007). I will demonstrate how the ultimate purpose of DPLs is broader than protecting the privacy rights of individuals, as it is to protect individuals against the risk of harm that may result from the collection, use or disclosure of their information. Likewise, with the proposed approach, only data that may present such risk of harm to individuals would be protected.

I argue that in certain cases, the harm will take place at the point of collection while in other cases, at the point where the data will be used or even disclosed. Instead of trying to determine exactly what “identifiable” individual means, I maintain that a method of interpretation, which is consistent with the original goals of DPLs, should be favoured. Relying and building on Calo’s theory (Calo, 2011) and others, I will elaborate a taxonomy of criteria in the form of a decision tree which takes into account the fact that while the collection or disclosure of information may trigger a more subjective kind of harm (the collection, a feeling of being observed and the disclosure, embarrassment and humiliation), the use of information will trigger a more objective kind of harm (financial, physical, discrimination, etc.). The risk of harm approach which I propose, applied to the definition, will reflect this and protect data only at the time that it presents such risk, or in light of the importance or extent of such risk of objective or subjective harm. Accordingly, interpreting the notion of “identifiability” will vary in light of the data handling activity at stake. For instance, while I maintain that the notion of “identifiability” should be interpreted in light of the overall sensitivity of the information being disclosed (taking into account other criteria which are relevant in evaluating the risk of subjective harm), I am also of the view that this notion is irrelevant when evaluating information being used (only the presence of an objective harm being relevant).

In the preliminary section of my article, I will provide an overview of the various conception of privacy, elaborate on the historical context leading to the adoption of DPLs and the elaboration of the definition of personal information and discuss the changes which have recently taken place at the technological level. In the following section (section 2), I will first elaborate on how a literal interpretation of the definition of personal information is no longer workable. In light of this, I will be presenting the proposed approach to interpreting the definition of personal information, under which the ultimate purpose behind DPLs should be taken into account. I will then demonstrate why the ultimate purpose of DPLs was to protect individuals against a risk of harm triggered by organizations collecting, using and disclosing their information. In section 3, I will demonstrate how this risk of harm can be subjective or objective, depending on the data handling activity at stake. I will offer a way forward, proposing a decision-tree test useful when deciding whether certain information should qualify as personal information. I will also demonstrate how the proposed test would work in practice, using practical business cases as examples.

The objective of my work is to come to a common understanding of the notion of personal information, the situations in which DPLs should be applied, and the way they should be applied. A corollary of this work is to provide guidance to lawmakers, policymakers, privacy commissioners, courts, organizations handling personal information and individuals assessing whether certain information are or should be governed by the relevant DPLs, depending on whether the data handling activity at stake creates a risk of harm for an individual. This will provide for a useful framework under which DPLs remain efficient in light of modern Internet technologies.

Mark MacCarthy, Social Networks: Privacy Externalities and Public Policy

Mark MacCarthy, Social Networks: Privacy Externalities and Public Policy

Comment by: Anupam Chander

PLSC 2011

Workshop draft abstract:

In the apocryphal Fitzgerald – Hemingway anecdote, Fitzgerald says the rich are very different from you and me; Hemmingway responds that the rich have more money. The exchange is relevant to social networks and privacy. Are social networks a disruptive technology that challenges existing thinking on privacy, or just more of the same?  The frame of this paper is that social networks are something radically new that this difference should prompt us to rethink the framework we use to structure public policy toward privacy.

From this perspective, social networks represent another example of the connection between technical development and the evolution of privacy policy.  Warren and Brandeis reacted to the widespread use of the snap camera in journalism to develop a tort-based right of privacy as the right to be left alone.  The widespread use of mainframe computers in the 1960s by large private and public institutions to compile and process information about individuals led to the ex ante rules embodied in the fair information practices framework.

Social networks present a similar technological breakthrough that forces us to rethink privacy assumptions.  Unlike the Internet they create and thrive upon a culture of identified sharing of information.  People who use social network sites want to be known by others.  Providing personal information on an ongoing basis to a limited group of other people is the whole point of a social network.  The privacy challenge is this: a technology that depends for its highest and best uses on the exchange of information is ill-suited to a privacy norm of standardized before-the- fact limitations on information exchange.

This paper sets the stage for a discussion of public policy on privacy toward social networks by examining several accounts in the literature of privacy in social networks. Nissenbaum’s contextualist theory holds that privacy is the right to the appropriate flow of information, where appropriate is defined by the context in which the information is created and exchanged.  One way to apply this approach to new contexts where information norms are not yet well developed is to assimilate new contexts to old ones.  This is the tack Nissenbaum takes when she denies that social networks are genuinely new phenomena, and tries to model them as a medium of information exchange like the telephone system.  Entrenched norms from this context of ordinary life apply, she says, and this explains the sense of outrage when information meant for a network of friends is used by recruiters to evaluate job candidates or by aggregation services such as Rapleaf that generate profiles from social network information and make them available in the context of marketing or eligibility decisions for insurance, credit or employment.

Strahilevitz approaches the public – private question in privacy tort with the apparatus of the sociological theory of social networks, which studies how information flows among groups of loosely or tightly connected individuals.  He concludes that a person should have a reasonable expectation of privacy when there is a low probability that information will flow beyond a limited subset of his friends.  If someone causes information to move beyond this group, Strahilevitz contends, then he should be liable for the privacy tort of public disclosure of private information.  This attempt to put some structure into the idea of privacy in public can be applied to social networks as new technologically-based institutions.  To the extent that there is a low probability that information disclosed by a social network user will travel beyond the network of friends for whom it was intended, then these users have a legitimate expectation that others will not cause the information to cross these boundaries. Under this approach, uses of social network information in employment or data aggregation contexts would be surprising and would violate these legitimate expectations of privacy.

Lipford and Hull et al. focus on the need for users to have a sense of how visible their information really is on social network sites.  Once they can see, using a tool such as Audience View, how others see their information they would then be in a position to determine how much information they would like to share.  Empowering user control in this way is the key to keeping information flows within the contextual norms for social networks, without users having to assume that all information they post will be public to everyone and can be used for every purpose.

These perspectives have limitations.  Nissenbaum misses the extent to which the new technology of sharing creates a genuinely new context, and that the simple extrapolation of old norms into the new context is insufficient to respond to its novelty.  Norms for information flow in social networks are under construction; they are contested terrain, not areas where privacy norms are completely specified and generally accepted.  What the information rules should be for social networks cannot be resolved by appeal to widely-shared intuitions about social norms; these questions require further normative debate.

Strahilevitz has an answer to the question of what rules should apply, but it misses a key dimension.  By focusing solely on the factual question of the actual probability of information flowing out of a social network context to other contexts, he mistakenly allows the normative privacy question to be decided by those who can create facts on the ground by appropriating social network information.  People might believe and expect that their information will stay in the social network context, but, despite what they think, the probability is really quite high that information made available on a social network site will migrate far beyond its original context. Under Strahilevitz’s theory, this fact would make the privacy expectations of social network users unreasonable. The normative dispute about privacy rules for social networks cannot be resolved by appeal to the facts of information flow.

The attempt by Lipford and Hull et al. to provide greater user control avoids the mistakes of appealing to old norms in a new context and the reduction of the normative question to a manipulable factual one.  It reaffirms the idea that consent is at the heart of privacy and urges the development of more visible and transparent user controls.  If people don’t want their information spreading beyond their immediate social network of friends, they should set their privacy controls to implement this preference.

This approach is probably the dominant public policy approach to privacy on social networks. Increased user control is often recommended as a way to address privacy issues on social networks.  Public policy demands from regulators, legislators and privacy advocates have focused primarily on giving users adequate control over social network information.

The limitations of individual user control as the primacy regulator of privacy have been widely discussed.  As applied to social networks, these concerns can be summarized as follows.  A blizzard of privacy choices in a social network context simply encourages passivity.  The number and type of choices will inevitably be too granular for those who care only to adopt the most restrictive or the most open of controls.  Alternatively, the choices will not be granular enough for others – preventing them from selectively revealing information to some, while withholding it from others. Moreover, an extraordinary level of knowledge is needed to evaluate what information is available to whom.  App developers, aggregations services, and data brokers can obtain supposedly concealed social network information in ways that surprise both users and operators of social network sites.  The use of this information is opaque as well, so there is little guidance available to individuals as to whether it is a good idea or a bad idea to share this information. Inevitably, some discretion is retained by the social network operator over what information cannot be controlled by users.  Finally, users cannot expect to keep information hidden when they are under suspicion of activities that are of public concern such as national security or cyber bulling, or impersonation.  Surveillance of social network activity for these purposes has to taken as a given.  These factors all mean that user control can never be complete, or completely effective at preventing privacy harms.

I approach the question of privacy in social networks through the lens of privacy externalities, and from this angle it is clear that social networks present a substantial challenge to the informed consent ideal of privacy regulation.  Information one person reveals often reveals information about others.  This is clearest in eligibility contexts, where, for instance, non-smokers can reveal the status of smokers by voluntarily answering optional insurance company questions on their smoking habits. Social networks are a major source of privacy externalities because networks of friends tend to have certain features in common.  People with similar sexual orientation, credit worthiness, political beliefs, racial identities tend to group together.  As a result, researchers, marketers and others can predict some characteristics of people based on characteristics of their network friends.  If your social network friends are gay, you probably are too.  If they are deadbeats, likely you are as well.  These indirect inferences about people can then be used for a variety of purposes, most of them, so far, unregulated.  These externalities can sometimes help people and sometimes hurt them, but they have to be taken into account when assessing privacy in a social network context.

Privacy externalities provide an additional reason why a focus on individual user control is misplaced.  Even when one person is perfectly willing to release information about himself, his decision has implications for others that, individualistically, he is not taking into account when deciding what is in his best interest to reveal. External privacy harms can be inflicted on people whose identity or essential information is revealed by the action of others.  In these cases, too much information has been released. Alternatively, external privacy benefits can be conferred on people who free ride on the information revealed by others.  In this case, too little information has been released.

The article draws together existing literature on these externalities in the social network contexts with a view toward demonstrating that they are frequent and pervasive in this context.  Together with other concerns that have been expressed regarding an overreliance on user controls they warrant a revision in the public policy perspective that privileges user control over other more effective ways to protect privacy in the context of social networks.

One implication of this approach is that standardized limitations on the collection of information through company policies, industry self-regulatory codes or legislation is the wrong way to go.  In the social network context, more information sharing often means greater benefits.  Many of these benefits are externalities in the sense that people benefit from information disclosure other than the individuals who have revealed the information.  Collection restrictions in the social network context mean default rules on the sharing of information that might unnecessarily restrict the growth of new innovative functions and benefits of social networks.


For example, sharing price and quality information of various goods and services among similarly situated network friends is a benefit to them. This information could be aggregated, analyzed combined with other information and made available to other network users.  The value of this service to its consumers increases more than linearly as the information on which it is based increases. The problem with leaving the decision on sharing this information entirely to individuals is that the amount of available information will be too small.  People will not factor in the benefits to others of having their information available to outside parties for analysis, and so they will chose to withhold when the socially optimal choice would be for sharing.

This focus on the beneficial uses of information sharing in a social network context has to be balanced by an assessment of the harmful uses of information exchanged on social networks.  If these harmful uses are not recognized and controlled, then network users will refuse to share information as a way to protect themselves from these possible harms.  The quality and quantity of information traded in this context will shrink and the full value of these new innovative tools will not be realized.

The focus has to be on the use of the information, not on what information is collected. Public policy in the form of legislation or regulation could develop prohibitions or restrictions on the harmful use of social network information.  We need an open debate and consideration of the question of whether social network information should be used, for example,  for eligibility decisions involving credit, employment, and insurance, or for setting individualized prices, terms or conditions for products or services.  This need not apply just to operators of social networks; scraping social networks by outside parties or disclosure by applications developers of information used for these purposes might be prohibited or restricted. Legislation of this type is already under consideration in Germany and California in the form of bills that would prohibit the use of Facebook information for employment screening.  Public policy might also develop the concept of publicly beneficial uses of information and restrict or prohibit user control for these uses.


As a background to this public policy approach to social network privacy, the paper develops and applies an unfairness model of privacy regulation under which uses of information fall into one of three categories: unfair, publicly beneficial and intermediate.  The unfair uses can be prohibited or subjected to strong opt-in defaults that disfavor them; the publicly beneficial uses should not be subject to easy to use user controls because the external benefits from sharing are so substantial and individual failure to participate would dissipate these benefits. The intermediate uses can be subjected to appropriate and well-tailored user controls.  The paper provides examples of each of these categories in a social network context.

In Part I, I review the literature on privacy and social networks, including discussions by Nissenbaum, Strahilevitz and Lipford and Hull et al and explore the limitations in their approaches.  Part II sets out the concept of privacy externalities in the social network context, exploring the ways in which information revealed by some people can reveal information about others.  It then discusses examples of positive and negative privacy externalities, where in some cases indirect inferences about people can help them and in other cases where it can hurt them.  Part III sets out the unfairness framework for regulating privacy and suggests various specific prohibitions and restrictions that might be considered by privacy legislation.  It also discusses how user controls can fit into this framework as a way to approach intermediate uses that are neither unfair, nor publicly beneficial. Part IV summarizes the discussion and concludes with specific suggestions for further research in this area.

Mark MacCarthy, New Directions in Privacy: Disclosure, Unfairness, and Externalities

Mark MacCarthy, New Directions in Privacy: Disclosure, Unfairness, and Externalities

Comment by: Lauren Willis

PLSC 2010

Workshop draft abstract:

Several recent developments underscore a return of public concerns about access to personal information by businesses and its possible misuse. The Administration is conducting an extensive interagency review of commercial privacy, Congress is considering legislation on online behavioral advertising and in November an international conference of government officials will likely approve a global standard on privacy protection.

But what’s the best way to protect privacy? David Vladeck, the new head of consumer protection for the Federal Trade Commission, has said he is dissatisfied with the existing policy frameworks for thinking about the issue. He’s right. The traditional framework of fair information practices is severely limited by excessive reliance on informed consent.  Restrictions on disclosure are impractical in a digital world where information collection is ubiquitous, where apparently anonymous or de-identified information can be associated with a specific person and where one person’s decision to share information can adversely affect others who choose to remain silent.  The alternative “harm” framework, however, seems to allow all sorts of privacy violations except when specific, tangible harm results.  If an online marketer secretly tracks you on the Internet and serves you ads based on which web sites you visited, well, where’s the harm?  How are you hurt by getting targeted ads instead of generic ones?  And yet people feel that secret tracking is the essence of a privacy violation.

The traditional harms approach is clearly too limited.  It defines the notion of harm so narrowly that privacy itself is no longer at stake.  And yet its focus on outcomes and substantive protection rather than process is a step in the right direction.

Part I of this paper describes the limitations on the informed consent model, suggesting that informed consent is neither necessary nor sufficient for a legitimate information practice. Part II explores the idea of negative privacy externalities, illustrating several ways in which data can be leaky.  It also discusses the ways in which the indirect disclosure of information can harm individuals through invidious discrimination, inefficient product variety restrictions on access, and price discrimination. Part III outlines the unfairness model, explores the three-part test for unfairness under the Federal Trade Commission Act, and compares the model to similar privacy frameworks that have been proposed as additions to (or replacements for) the informed consent model.  Part IV explores how to apply the unfairness framework to some current privacy issues involving online behavioral advertising and social networks.