Using sensitive personal data may be necessary for avoiding discrimination in data-driven decision models

Artificial Intelligence and Law 24 (2):183-201 (2016)
  Copy   BIBTEX

Abstract

Increasing numbers of decisions about everyday life are made using algorithms. By algorithms we mean predictive models captured from historical data using data mining. Such models often decide prices we pay, select ads we see and news we read online, match job descriptions and candidate CVs, decide who gets a loan, who goes through an extra airport security check, or who gets released on parole. Yet growing evidence suggests that decision making by algorithms may discriminate people, even if the computing process is fair and well-intentioned. This happens due to biased or non-representative learning data in combination with inadvertent modeling procedures. From the regulatory perspective there are two tendencies in relation to this issue: to ensure that data-driven decision making is not discriminatory, and to restrict overall collecting and storing of private data to a necessary minimum. This paper shows that from the computing perspective these two goals are contradictory. We demonstrate empirically and theoretically with standard regression models that in order to make sure that decision models are non-discriminatory, for instance, with respect to race, the sensitive racial information needs to be used in the model building process. Of course, after the model is ready, race should not be required as an input variable for decision making. From the regulatory perspective this has an important implication: collecting sensitive personal data is necessary in order to guarantee fairness of algorithms, and law making needs to find sensible ways to allow using such data in the modeling process.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,202

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Data models and the acquisition and manipulation of data.Todd Harris - 2003 - Philosophy of Science 70 (5):1508-1517.
Implicit memory-a data-driven concept, or conceptually driven data.Wt Neill & Jl la ValdesBeck - 1990 - Bulletin of the Psychonomic Society 28 (6):482-482.
What Counts as Scientific Data? A Relational Framework.Sabina Leonelli - 2015 - Philosophy of Science 82 (5):810-821.
A Universal Approach to Guarantee Data Privacy.Thomas Studer - 2013 - Logica Universalis 7 (2):195-209.
Data Interpretation in the Digital Age.Sabina Leonelli - 2014 - Perspectives on Science 22 (3):397-417.
Mental models in data interpretation.Clark A. Chinn & William F. Brewer - 1996 - Philosophy of Science 63 (3):219.
Challenges of web-based personal genomic data sharing.Pascal Borry & Mahsa Shabani - 2015 - Life Sciences, Society and Policy 11 (1):1-13.
Ethics in deploying data to make wise decisions.T. V. Gopal - 2007 - International Review of Information Ethics 7:1-7.

Analytics

Added to PP
2016-05-15

Downloads
36 (#421,132)

6 months
7 (#350,235)

Historical graph of downloads
How can I increase my downloads?