research-article

Public Access

Bias in machine learning software: why? how? what to do?

Authors:

Joymallya Chakraborty,

Suvodeep Majumder,

Tim MenziesAuthors Info & Claims

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 429 - 440

https://doi.org/10.1145/3468264.3468537

Published: 18 August 2021 Publication History

Abstract

Increasingly, software is making autonomous decisions in case of criminal sentencing, approving credit cards, hiring employees, and so on. Some of these decisions show bias and adversely affect certain social groups (e.g. those defined by sex, race, age, marital status). Many prior works on bias mitigation take the following form: change the data or learners in multiple ways, then see if any of that improves fairness. Perhaps a better approach is to postulate root causes of bias and then applying some resolution strategy. This paper postulates that the root causes of bias are the prior decisions that affect- (a) what data was selected and (b) the labels assigned to those examples. Our Fair-SMOTE algorithm removes biased labels; and rebalances internal distributions such that based on sensitive attribute, examples are equal in both positive and negative classes. On testing, it was seen that this method was just as effective at reducing bias as prior approaches. Further, models generated via Fair-SMOTE achieve higher performance (measured in terms of recall and F1) than other state-of-the-art fairness improvement algorithms. To the best of our knowledge, measured in terms of number of analyzed learners and datasets, this study is one of the largest studies on bias mitigation yet presented in the literature.

References

[1]

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G

[2]

http://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212

[3]

https://science.sciencemag.org/content/356/6334/183

[4]

arxiv:2012.09951

[5]

https://doi.org/10.1145/3368089.3409697

Digital Library

[6]

https://www.cairn.info/revue-horizons-strategiques-2007-3-page-17.htm

[7]

https://www.migpolgroup.com/_old/portfolio/proving-discrimination-cases-the-role-of-situation-testing/

[8]

http://dx.doi.org/10.1613/jair.953

[9]

http://papers.nips.cc/paper/6988-optimized-pre-processing-for-discrimination-prevention.pdf

[10]

http://www.aies-conference.com/wp-content/papers/main/AIES_2018_paper_162.pdf

[11]

https://doi.org/10.1016/j.ins.2017.09.064

Digital Library

[12]

http://fairware.cs.umass.edu/

[13]

https://2019.ase-conferences.org/home/explain-2019

[14]

https://github.com/IBM/AIF360

[15]

https://www.microsoft.com/en-us/research/group/fate/

[16]

https://qz.com/1268520/facebook-says-it-has-a-tool-to-detect-bias-in-its-artificial-intelligence/

[17]

http://jmlr.org/papers/v13/bergstra12a.html

[18]

http://mlr.cs.umass.edu/ml/datasets/Adult

[19]

https://github.com/propublica/compas-analysis

[20]

https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit +Data)

[21]

https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

[22]

https://archive.ics.uci.edu/ml/datasets/Heart+Disease

[23]

https://www.kaggle.com/c/bank-marketing-uci

[24]

https://www.kaggle.com/c/home-credit-default-risk

[25]

https://archive.ics.uci.edu/ml/datasets/Student+Performance

[26]

https://meps.ahrq.gov/mepsweb/

[27]

https://doi.org/10.1007/s10618-010-0190-x

Digital Library

[28]

https://doi.org/10.1007/s10115-011-0463-8

Digital Library

[29]

arxiv:1610.02413

[30]

arxiv:2011.03173

[31]

arxiv:1901.04966

[32]

https://www.ijcai.org/proceedings/2017/0549.pdf

[33]

arxiv:1805.05859

[34]

http://dx.doi.org/10.1145/3106237.3106277

Digital Library

[35]

http://dx.doi.org/10.1145/3238147.3238165

Digital Library

[36]

http://doi.acm.org/10.1145/3338906.3338937

Digital Library

[37]

https://doi.org/10.1145/3377811.3380331

Digital Library

[38]

http://dx.doi.org/10.1145/3368089.3409704

Digital Library

[39]

arxiv:1905.05786

[40]

https://doi.org/10.1145/3324884.3418932

Digital Library

[41]

https://doi.org/10.1145/3236024.3264590

Digital Library

[42]

https://pages.awscloud.com/rs/112-TZM-766/images/Fairness.Measures.for.Machine.Learning.in.Finance.pdf

[43]

https://blogs.thomsonreuters.com/answerson/ai-fairness-bias/

[44]

https://doi.org/10.1145/2020408.2020488

Digital Library

[45]

https://ieeexplore.ieee.org/document/7194626

[46]

https://ieeexplore.ieee.org/document/6235961

Cited By

Karpouzis K(2024)Plato’s Shadows in the Digital Cave: Controlling Cultural Bias in Generative AIElectronics10.3390/electronics1308145713:8(1457)Online publication date: 11-Apr-2024
https://doi.org/10.3390/electronics13081457
Peoples CKnudsen PFuentes M(2024)The Use of Facial Recognition in Sociological Research: A Comparison of ClarifAI and Kairos Classifications to Hand-Coded ImagesSocius: Sociological Research for a Dynamic World10.1177/2378023124125965910Online publication date: 20-Jun-2024
https://doi.org/10.1177/23780231241259659
Robles Herrera SMonjezi VKreinovich VTrivedi ATizpaz-Niari SShang WLamothe MWan Z(2024)Predicting Fairness of ML Software ConfigurationsProceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3663533.3664040(56-65)Online publication date: 10-Jul-2024
https://doi.org/10.1145/3663533.3664040
Show More Cited By

Index Terms

Bias in machine learning software: why? how? what to do?
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management

Recommendations

Fairea: a model behaviour mutation approach to benchmarking bias mitigation methods
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

The increasingly wide uptake of Machine Learning (ML) has raised the significance of the problem of tackling bias (i.e., unfairness), making it a primary software engineering concern. In this paper, we introduce Fairea, a model behaviour mutation ...
Fairway: a way to build fair ML software
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Machine learning software is increasingly being used to make decisions that affect people's lives. But sometimes, the core part of this software (the learned model), behaves in a biased manner that gives undue advantages to a specific group of people (...
Fairness metrics and bias mitigation strategies for rating predictions
Abstract
Algorithm fairness is an established line of research in the machine learning domain with substantial work while the equivalent in the recommender system domain is relatively new. In this article, we consider rating-based recommender ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 2021

1690 pages

ISBN:9781450385626

DOI:10.1145/3468264

General Chairs:
Diomidis Spinellis
Athens University of Economics and Business, Greece
,
Georgios Gousios
Facebook, Netherlands / Delft University of Technology, Netherlands
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Massimiliano Di Penta
University of Sannio, Italy

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Distinguished Paper

Author Tags

Qualifiers

Research-article

Funding Sources

Laboratory for Analytic Sciences
NSF

Conference

ESEC/FSE '21

Sponsor:

SIGSOFT

ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 23 - 28, 2021

Athens, Greece

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
5,740
Total Downloads

Downloads (Last 12 months)3,746
Downloads (Last 6 weeks)385

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Karpouzis K(2024)Plato’s Shadows in the Digital Cave: Controlling Cultural Bias in Generative AIElectronics10.3390/electronics1308145713:8(1457)Online publication date: 11-Apr-2024
https://doi.org/10.3390/electronics13081457
Peoples CKnudsen PFuentes M(2024)The Use of Facial Recognition in Sociological Research: A Comparison of ClarifAI and Kairos Classifications to Hand-Coded ImagesSocius: Sociological Research for a Dynamic World10.1177/2378023124125965910Online publication date: 20-Jun-2024
https://doi.org/10.1177/23780231241259659
Robles Herrera SMonjezi VKreinovich VTrivedi ATizpaz-Niari SShang WLamothe MWan Z(2024)Predicting Fairness of ML Software ConfigurationsProceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3663533.3664040(56-65)Online publication date: 10-Jul-2024
https://doi.org/10.1145/3663533.3664040
Xiao YZhang JLiu YMousavi MLiu SXue D(2024)MirrorFair: Fixing Fairness Bugs in Machine Learning Software via Counterfactual PredictionsProceedings of the ACM on Software Engineering10.1145/36608011:FSE(2121-2143)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660801
Chen ZZhang JHort MHarman MSarro F(2024)Fairness Testing: A Comprehensive Survey and Analysis of TrendsACM Transactions on Software Engineering and Methodology10.1145/365215533:5(1-59)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3652155
Hort MChen ZZhang JHarman MSarro F(2024)Bias Mitigation for Machine Learning Classifiers: A Comprehensive SurveyACM Journal on Responsible Computing10.1145/36313261:2(1-52)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3631326
Jiang JYang JZhang YWang ZYou HChen J(2024)A Post-training Framework for Improving the Performance of Deep Learning Models via Model TransformationACM Transactions on Software Engineering and Methodology10.1145/363001133:3(1-41)Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1145/3630011
Zheng WLin LWu XChen X(2024)An Empirical Study on Correlations Between Deep Neural Network Fairness and Neuron Coverage CriteriaIEEE Transactions on Software Engineering10.1109/TSE.2023.334900150:3(391-412)Online publication date: Mar-2024
https://doi.org/10.1109/TSE.2023.3349001
Couder JGomez DOchoa O(2024)Requirements Verification Through the Analysis of Source Code by Large Language ModelsSoutheastCon 202410.1109/SoutheastCon52093.2024.10500073(75-80)Online publication date: 15-Mar-2024
https://doi.org/10.1109/SoutheastCon52093.2024.10500073
Johnson BMenzies T(2024)Ethics: Why Software Engineers Can’t Afford to Look AwayIEEE Software10.1109/MS.2023.331976841:1(142-144)Online publication date: Jan-2024
https://doi.org/10.1109/MS.2023.3319768
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents