Tal Zarsky, Data Mining, Personal Information & Discrimination
Comment by: James Rule
Published version available here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1983326
Workshop draft abstract:
Governments are extremely interested in figuring out what their citizens are going to do. To meet this ambitious objective, governments began to engage in predictive modeling analyses which they apply to massive datasets of personal information at their disposal, from both governmental and commercial sources. Such analyses are, in many cases, enabled by data mining, which allows for both automation, and the revealing of previously unknown patterns. The outcomes of these analyses are rules and associations, providing approximated predictions as to future actions and reactions of individuals. These individualized predictions can thereafter be used by government officials (or possibly solely as part of an automated system) for a variety of tasks. For instance, such input could be used to make decisions regarding the future allocation of resources and privileges. In other instances, these predictions can be used to establish risks posed by specific individuals (in terms of security or law enforcement), while allowing the state to take relevant precautions. The patterns and decision-trees resulting from these analyses (and further applied to the policy objectives mentioned) are very different from those broadly used today to differentiate among groups and individuals; they might include a great variety of factors and variables, or perhaps are only established by an algorithm using ever-changing rules which cannot be easily understand by the observing human.
As knowledge of such ventures is unfolding, and technological advances enable their expansion, policymakers and scholars are quickly moving to point out the determinants of such practices. They are striving to establish which practices are legitimate and which go too far. This article aims to join this discussion, in an attempt to resolve arguments and misunderstandings concerning a central argument raised in this discussion – whether such practices amount to unfair discrimination. This argument is challenging, as governments often treat different individuals differently (and are indeed required to do so). This article sets out to examine why and in which instances must these practices be considered discriminatory and problematic when carried out by government.
To approach this concern, the article identifies and explores three arguments in the context of discrimination:
(1) These practices might resemble, enable or are a proxy for various forms of discrimination which are currently prohibited by law: here the article will briefly note these forms of illegal discrimination and the theories underlying their prohibition. It will then explain how these novel forms of allegedly “neutral” models might generate similar results and effects.
(2) These practices might generate stigma and stereotypes for those indicated by this process: This argument is challenging given the fact that these forms of discrimination use elaborate factors and are at times opaque. However, in some context these concerns might indeed persist, especially when we fear these practices will “creep” into other projects. In addition, these practices might also signal the governmental “appetite” for discrimination, and set an inappropriate example for the public.
(3) These practices “punish” individuals for things they did not do; or, are premised on what they did (actions) and how they are viewed, but not who they really are: – these loosely connected arguments are commonly raised, yet require several in-depth inquiries: are these concerns are sound policy considerations or utterances of a Luddite crowd? Are the practices here addressed generating these concerns? In this discussion, the article will distinguish among immutable and changeable attributes.
While examining these arguments, the article emphasizes that if individualized predictive modeling is banned, government will engage in other forms of analysis to distinguish among individuals, their needs and the risks they pose. Thus, predictive modeling would constantly be compared to the common practice of treating individuals as part of predefined groups. Other dominant alternatives are allowing for broad discretion and refraining from differentiated treatment, or doing so on a random basis.
After mapping out these concerns and taking note of the alternatives, the article moves to offer concrete recommendations. While distinguishing among different contexts, it advises when these concerns lead to abandoning the use of such prediction models. It further notes other steps which might be taken to allow these practices to persist – steps which range from closely monitoring decision making models and making needed adjustments to eliminate some forms of discrimination, greater governmental disclosure and transparency and public education. It concludes by noting that in some cases and with specific tinkering, predictive modeling which is premised upon personal information can leads to outcomes which promote fairness and equality.
It is duly noted that perhaps the most central arguments against predictive data mining practices are premised upon other theories and concerns. For instance, often mentioned are the inability to control personal information, the lack of transparency and the existence and persistence of errors in the analysis and decision making process. This article acknowledges these concerns, yet leaves them to be addressed in other segments of this broader research project. Furthermore, it sees great importance in addressing the “discrimination” element separately and directly; at times, the other concerns noted could be resolved. In other instances, the interests addressed here are confused with other elements. For these reasons and others, a specific inquiry is warranted. In addition, the article sets aside the argument that such actions are problematic, as they are premised upon decisions made by a machine, as opposed to a fellow individual. This powerful argument is also to be addressed elsewhere. Finally, the article acknowledges that these policy strategies should only be adopted if proven efficient and effective. This finding must be established by experts on a case-by-case basis. The arguments set out here, though, can also assist in adding additional factors to such analyses.
Addressing the issues at hand is challenging. They present a considerable paradigm shift in the way policymakers and scholars have been thinking of discrimination and decision-making in the past. In addition, there is the difficulty of applying existing legal paradigms and doctrines to these new concerns: is this question one of privacy, equality and discrimination, data protection, autonomy – something else or none of the above? Rather than tacking this latter question, I apply methodologies from all of the above to address an issue which will no doubt constantly arise in today’s environment of ongoing flows of personal information.