research-article

Open access

MirrorFair: Fixing Fairness Bugs in Machine Learning Software via Counterfactual Predictions

Authors:

Mohammad Reza Mousavi,

Dingyuan XueAuthors Info & Claims

Proceedings of the ACM on Software Engineering, Volume 1, Issue FSE

Article No.: 94, Pages 2121 - 2143

https://doi.org/10.1145/3660801

Published: 12 July 2024 Publication History

Abstract

With the increasing utilization of Machine Learning (ML) software in critical domains such as employee hiring, college admission, and credit evaluation, ensuring fairness in the decision-making processes of underlying models has emerged as a paramount ethical concern. Nonetheless, existing methods for rectifying fairness issues can hardly strike a consistent trade-off between performance and fairness across diverse tasks and algorithms. Informed by the principles of counterfactual inference, this paper introduces MirrorFair, an innovative adaptive ensemble approach designed to mitigate fairness concerns. MirrorFair initially constructs a counterfactual dataset derived from the original data, training two distinct models—one on the original dataset and the other on the counterfactual dataset. Subsequently, MirrorFair adaptively combines these model predictions to generate fairer final decisions. We conduct an extensive evaluation of MirrorFair and compare it with 15 existing methods across a diverse range of decision-making scenarios. Our findings reveal that MirrorFair outperforms all the baselines in every measurement (i.e., fairness improvement, performance preservation, and trade-off metrics). Specifically, in 93% of cases, MirrorFair surpasses the fairness and performance trade-off baseline proposed by the benchmarking tool Fairea, whereas the state-of-the-art method achieves this in only 88% of cases. Furthermore, MirrorFair consistently demonstrates its superiority across various tasks and algorithms, ranking first in balancing model performance and fairness in 83% of scenarios. To foster replicability and future research, we have made our code, data, and results openly accessible to the research community.

References

[1]

1994. The German dataset. https://archive.ics.uci.edu/ml/datasets/statlog (german+credit+data)

[2]

2014. The Bank dataset. https://archive.ics.uci.edu/ml/datasets/bank+marketing

[3]

2015. The Mep dataset. https://meps.ahrq.gov/mepsweb/data_stats/download_data_files.jsp

[4]

2016. The Compas dataset. https://github.com/propublica/compas-analysis

[5]

2017. The Adult Census Income dataset. https://archive.ics.uci.edu/ml/datasets/adult

[6]

2024. Replication package for MirrorFair. https://github.com/XY-Showing/FSE2024-MirrorFair

[7]

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 291–300.

Digital Library

[8]

Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, and Aleksandra Mojsilović. 2019. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63, 4/5 (2019), 4–1.

[9]

Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, and Ed H Chi. 2019. Fairness in recommendation ranking through pairwise comparisons. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2212–2220.

Digital Library

[10]

Sarah Bird, Miro Dudík, Richard Edgar, Brandon Horn, Roman Lutz, Vanessa Milan, Mehrnoosh Sameki, Hanna Wallach, and Kathleen Walker. 2020. Fairlearn: A toolkit for assessing and improving fairness in AI. Microsoft, Tech. Rep. MSR-TR-2020-32.

[11]

Sumon Biswas and Hridesh Rajan. 2023. Fairify: Fairness verification of neural networks. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1546–1558.

[12]

Leo Breiman. 2001. Random forests. Machine learning, 45 (2001), 5–32.

[13]

Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108–122.

[14]

Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. 2017. Optimized pre-processing for discrimination prevention. Advances in neural information processing systems, 30 (2017).

Digital Library

[15]

L Elisa Celis, Lingxiao Huang, Vijay Keswani, and Nisheeth K Vishnoi. 2019. Classification with fairness constraints: A meta-algorithm with provable guarantees. In Proceedings of the conference on fairness, accountability, and transparency. 319–328.

Digital Library

[16]

Joymallya Chakraborty, Suvodeep Majumder, and Tim Menzies. 2021. Bias in machine learning software: why? how? what to do? In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 429–440.

Digital Library

[17]

Joymallya Chakraborty, Suvodeep Majumder, Zhe Yu, and Tim Menzies. 2020. Fairway: A way to build fair ml software. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 654–665.

Digital Library

[18]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16 (2002), 321–357.

[19]

Zhenpeng Chen, Jie Zhang, Federica Sarro, and Mark Harman. 2022. MAAT: A Novel Ensemble Approach to Addressing Fairness and Performance Bugs for Machine Learning Software. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

Digital Library

[20]

Zhenpeng Chen, Jie Zhang, Federica Sarro, and Mark Harman. 2023. Fairness Improvement with Multiple Protected Attributes: How Far Are We? In 46th International Conference on Software Engineering (ICSE 2024).

[21]

Zhenpeng Chen, Jie M Zhang, Max Hort, Federica Sarro, and Mark Harman. 2022. Fairness Testing: A Comprehensive Survey and Analysis of Trends. arXiv preprint arXiv:2207.10223.

[22]

Zhenpeng Chen, Jie M Zhang, Federica Sarro, and Mark Harman. 2023. A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers. ACM Transactions on Software Engineering and Methodology.

[23]

Jeffrey Dastin. 2018. Amazon scraps secret AI recruiting tool that showed bias against women — reuters.com. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G [Accessed 13-Apr-2023]

[24]

Anupam Datta, Matt Fredrikson, Gihyuk Ko, Piotr Mardziel, and Shayak Sen. 2017. Proxy non-discrimination in data-driven systems. arXiv preprint arXiv:1707.08120.

[25]

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.

Digital Library

[26]

Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 259–268.

Digital Library

[27]

Grant R Fowles. 1989. Introduction to modern optics. Courier Corporation.

[28]

Xuanqi Gao, Juan Zhai, Shiqing Ma, Chao Shen, Yufei Chen, and Qian Wang. 2022. FairNeuron: improving deep neural network fairness with adversary games on selective neurons. In Proceedings of the 44th International Conference on Software Engineering. 921–933.

Digital Library

[29]

Usman Gohar, Sumon Biswas, and Hridesh Rajan. 2023. Towards understanding fairness and its composition in ensemble machine learning. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1533–1545.

[30]

Google. [n. d.]. Machine Learning Glossary: Fairness | Google Developers — developers.google.com. https://developers.google.com/machine-learning/glossary/fairness [Accessed 26-Mar-2023]

[31]

Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29 (2016).

[32]

Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. 1998. Support vector machines. IEEE Intelligent Systems and their applications, 13, 4 (1998), 18–28.

Digital Library

[33]

Max Hort, Jie M Zhang, Federica Sarro, and Mark Harman. 2021. Fairea: A model behaviour mutation approach to benchmarking bias mitigation methods. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 994–1006.

Digital Library

[34]

Zhenlan Ji, Pingchuan Ma, Shuai Wang, and Yanhui Li. 2023. Causality-Aided Trade-off Analysis for Machine Learning Fairness. arXiv preprint arXiv:2305.13057.

[35]

Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and information systems, 33, 1 (2012), 1–33.

[36]

Faisal Kamiran, Asim Karim, and Xiangliang Zhang. 2012. Decision theory for discrimination-aware classification. In 2012 IEEE 12th international conference on data mining. 924–929.

Digital Library

[37]

Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Fairness-aware classifier with prejudice remover regularizer. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II 23. 35–50.

[38]

David G Kleinbaum, K Dietz, M Gail, Mitchel Klein, and Mitchell Klein. 2002. Logistic regression. Springer.

[39]

Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. Advances in neural information processing systems, 30 (2017).

[40]

Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed Chi. 2020. Fairness without demographics through adversarially reweighted learning. Advances in neural information processing systems, 33 (2020), 728–740.

[41]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature, 521, 7553 (2015), 436–444.

[42]

Tianlin Li, Xiaofei Xie, Jian Wang, Qing Guo, Aishan Liu, Lei Ma, and Yang Liu. 2023. Faire: Repairing Fairness of Neural Networks via Neuron Condition Synthesis. ACM Transactions on Software Engineering and Methodology.

[43]

Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50–60.

[44]

Verya Monjezi, Ashutosh Trivedi, Gang Tan, and Saeid Tizpaz-Niari. 2023. Information-Theoretic Testing and Debugging of Fairness Defects in Deep Neural Networks. arXiv preprint arXiv:2304.04199.

[45]

Judea Pearl. 2009. Causality. Cambridge university press.

[46]

Judea Pearl and Dana Mackenzie. 2018. The book of why: the new science of cause and effect. Basic books.

[47]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12 (2011), 2825–2830.

Digital Library

[48]

Kewen Peng, Joymallya Chakraborty, and Tim Menzies. 2022. FairMask: Better Fairness via Model-based Rebalancing of Protected Attributes. IEEE Transactions on Software Engineering.

[49]

Dana Pessach and Erez Shmueli. 2022. A review on fairness in machine learning. ACM Computing Surveys (CSUR), 55, 3 (2022), 1–44.

Digital Library

[50]

Long H Pham, Jiaying Li, and Jun Sun. 2020. SOCRATES: Towards a Unified Platform for Neural Network Analysis. arXiv preprint arXiv:2007.11206.

[51]

Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. 2017. On fairness and calibration. Advances in neural information processing systems, 30 (2017).

[52]

Alexander S. Poznyak. 2008. Chapter 14 - Sets, Functions and Metric Spaces. In Advanced Mathematical Tools for Automatic Control Engineers: Deterministic Techniques, Alexander S. Poznyak (Ed.). Elsevier, Oxford. 251–274. isbn:978-0-08-044674-5 https://doi.org/10.1016/B978-008044674-5.50017-1

[53]

Daniel Rodriguez, Israel Herraiz, Rachel Harrison, Javier Dolado, and José C Riquelme. 2014. Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering. 1–10.

Digital Library

[54]

Omer Sagi and Lior Rokach. 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8, 4 (2018), e1249.

[55]

Bing Sun, Jun Sun, Long H Pham, and Jie Shi. 2022. Causality-based neural network repair. In Proceedings of the 44th International Conference on Software Engineering. 338–349.

Digital Library

[56]

Developers TensorFlow. 2018. TensorFlow. Site oficial.

[57]

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International conference on machine learning. 325–333.

[58]

Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 335–340.

Digital Library

[59]

Junzhe Zhang and Elias Bareinboim. 2018. Fairness in decision-making—the causal explanation formula. In Proceedings of the AAAI Conference on Artificial Intelligence. 32.

[60]

Jie M Zhang and Mark Harman. 2021. “Ignorance and Prejudice” in Software Fairness. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1436–1447.

[61]

Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering, 48, 1 (2020), 1–36.

Digital Library

[62]

Mengdi Zhang and Jun Sun. 2022. Adaptive fairness improvement based on causality analysis. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 6–17.

Digital Library

[63]

Indre Zliobaite. 2015. A survey on measuring indirect discrimination in machine learning. arXiv preprint arXiv:1511.00148.

Index Terms

MirrorFair: Fixing Fairness Bugs in Machine Learning Software via Counterfactual Predictions
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management

Recommendations

Fix Fairness, Don’t Ruin Accuracy: Performance Aware Fairness Repair using AutoML
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Machine learning (ML) is increasingly being used in critical decision-making software, but incidents have raised questions about the fairness of ML predictions. To address this issue, new tools and methods are needed to mitigate bias in ML-based ...
MAAT: a novel ensemble approach to addressing fairness and performance bugs for machine learning software
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Machine Learning (ML) software can lead to unfair and unethical decisions, making software fairness bugs an increasingly significant concern for software engineers. However, addressing fairness bugs often comes at the cost of introducing more ML ...
Measuring Fairness in Machine Learning Models via Counterfactual Examples
Modeling Decisions for Artificial Intelligence
Abstract
Machine learning has become a vital resource of the modern society. It is present in everything around us, from a smartwatch to a self-driving car. To train a machine learning model, a heap of data is used. This can be worrisome in the case of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Software Engineering

Proceedings of the ACM on Software Engineering Volume 1, Issue FSE

July 2024

2770 pages

EISSN:2994-970X

DOI:10.1145/3554322

Editor:
Luciano Baresi
Politecnico di Milano, Italy

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2024

Published in PACMSE Volume 1, Issue FSE

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
158
Total Downloads

Downloads (Last 12 months)158
Downloads (Last 6 weeks)51

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents