Eloïse Gratton, Interpreting “personal information” in light of its underlying risk of harm
Comment by: Mark MacCarthy
Workshop draft abstract:
“Personal data are and will remain a valuable asset, but what counts as personal data? If one wants to protect X, one needs to know what X is.”(van den Hoven, 2008)
In the late sixties and early seventies, with the development of automated data banks and the growing use of computers in the private and public sector, privacy was conceptualized as having individuals “in control over their personal information” (Westin, 1967). The principles of Fair Information Practices were elaborated during this period and have been incorporated in data protection laws (“DPLs”) adopted in various jurisdictions around the world ever since.
Personal information is defined similarly in various DPLs (such as in Europe and Canada) as “information relating to an identifiable individual”. In the U.S., information is accorded special recognition through a series of sectoral privacy statutes focused on protecting “personally identifiable information” (or PII), a notion close to personal information. This definition of personal information (or very similar definitions) have been included in transnational policy instruments including the OECD Guidelines. Going back in time, we can note that identical or at least similar definitions of personal information were in fact used in the resolutions leading to the elaboration of the Convention 108 dating back to the early seventies (Council of Europe, Resolutions (73) 2 and (74) 29). This illustrates that a similar definition of personal information was already elaborated at that time, and has not been modified since.
In recent days, with the Internet and the circulation of new types of information, the efficiency of this definition may be challenged. Individuals constantly give off personal information and recent technological developments are triggering the emergence of new identification tools allowing for easier identification of individuals. Data-mining techniques and capabilities are reaching new levels of sophistication. Because it is now possible to interpret almost any data as personal information (any data can in one way or another be related to some individual) the question arises as to how much data should be considered as personal information. I maintain that when using a literal interpretation of the definition of personal information, many negative outcomes may occur. First, DPLs may be protecting all personal information, regardless of whether the information is worthy of protection, encouraging a potentially over-inclusive and burdensome framework. This definition may also prove to be under-inclusive as it may not govern certain profiles (falling outside of the scope of the definition), even if these profiles, although they may not “identify” an individual by name, may still be used against the individuals behind them. A literal interpretation of this definition may also create various uncertainties, especially in light of new types of data and collection tools which may identify a device or an object which may be used by one or more individuals.
In light of these issues, various authors have recently been proposing potential guidance, mostly on the issue of what “identifiability” actually means. For example, the work of Bercic and George (2009) is examining how knowledge of relational database design principles can greatly help to understand what is and what is not personal data. Lundevall-Unger and Tranvik (2011) propose a different and practical method for deciding the legal status of IP addresses (with regard to the concept of personal data) which consist of a “likely reasonable” test, resolved by assessing the costs (in terms of time, money, expertise, etc.) associated with employing legal methods of identification. Schwartz and Solove (2011) also argue that the current approaches to PII are flawed and propose a new approach called “PII 2.0,” which accounts for PII’s malleability. Based upon a standard rather than a rule, PII 2.0 would be based upon a continuum of “risk of identification” and would regulate information that relates to either an “identified” or “identifiable” individual (making a distinction between the two categories), and establishing different requirements for each category.
My contribution in providing guidance on this notion of “identifiability” has to do with using a new method for interpreting the notion of personal information, taking into account the ultimate purpose behind the adoption of DPLs, in order to ensure that only data that were meant to be covered by DPLs will in fact be covered. In the context of proposing such interpretation, the idea is to aim for a level of generality which corresponds with the highest level goal that the lawmakers wished to achieve (Bennet Moses, 2007). I will demonstrate how the ultimate purpose of DPLs is broader than protecting the privacy rights of individuals, as it is to protect individuals against the risk of harm that may result from the collection, use or disclosure of their information. Likewise, with the proposed approach, only data that may present such risk of harm to individuals would be protected.
I argue that in certain cases, the harm will take place at the point of collection while in other cases, at the point where the data will be used or even disclosed. Instead of trying to determine exactly what “identifiable” individual means, I maintain that a method of interpretation, which is consistent with the original goals of DPLs, should be favoured. Relying and building on Calo’s theory (Calo, 2011) and others, I will elaborate a taxonomy of criteria in the form of a decision tree which takes into account the fact that while the collection or disclosure of information may trigger a more subjective kind of harm (the collection, a feeling of being observed and the disclosure, embarrassment and humiliation), the use of information will trigger a more objective kind of harm (financial, physical, discrimination, etc.). The risk of harm approach which I propose, applied to the definition, will reflect this and protect data only at the time that it presents such risk, or in light of the importance or extent of such risk of objective or subjective harm. Accordingly, interpreting the notion of “identifiability” will vary in light of the data handling activity at stake. For instance, while I maintain that the notion of “identifiability” should be interpreted in light of the overall sensitivity of the information being disclosed (taking into account other criteria which are relevant in evaluating the risk of subjective harm), I am also of the view that this notion is irrelevant when evaluating information being used (only the presence of an objective harm being relevant).
In the preliminary section of my article, I will provide an overview of the various conception of privacy, elaborate on the historical context leading to the adoption of DPLs and the elaboration of the definition of personal information and discuss the changes which have recently taken place at the technological level. In the following section (section 2), I will first elaborate on how a literal interpretation of the definition of personal information is no longer workable. In light of this, I will be presenting the proposed approach to interpreting the definition of personal information, under which the ultimate purpose behind DPLs should be taken into account. I will then demonstrate why the ultimate purpose of DPLs was to protect individuals against a risk of harm triggered by organizations collecting, using and disclosing their information. In section 3, I will demonstrate how this risk of harm can be subjective or objective, depending on the data handling activity at stake. I will offer a way forward, proposing a decision-tree test useful when deciding whether certain information should qualify as personal information. I will also demonstrate how the proposed test would work in practice, using practical business cases as examples.
The objective of my work is to come to a common understanding of the notion of personal information, the situations in which DPLs should be applied, and the way they should be applied. A corollary of this work is to provide guidance to lawmakers, policymakers, privacy commissioners, courts, organizations handling personal information and individuals assessing whether certain information are or should be governed by the relevant DPLs, depending on whether the data handling activity at stake creates a risk of harm for an individual. This will provide for a useful framework under which DPLs remain efficient in light of modern Internet technologies.