skip to main content
10.1145/3663533.3664040acmconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
research-article
Open access

Predicting Fairness of ML Software Configurations

Published: 10 July 2024 Publication History

Abstract

This paper investigates the relationships between hyperparameters of machine learning and fairness. Data-driven solutions are increasingly used in critical socio-technical applications where ensuring fairness is important. Rather than explicitly encoding decision logic via control and data structures, the ML developers provide input data, perform some pre-processing, choose ML algorithms, and tune hyperparameters (HPs) to infer a program that encodes the decision logic. Prior works report that the selection of HPs can significantly influence fairness. However, tuning HPs to find an ideal trade-off between accuracy, precision, and fairness has remained an expensive and tedious task. Can we predict the fairness of HP configuration for a given dataset? Are the predictions robust to distribution shifts? We focus on group fairness notions and investigate the HP space of 5 training algorithms. We first find that tree regressors and XGBoots significantly outperformed deep neural networks and support vector machines in accurately predicting the fairness of HPs. When predicting the fairness of ML hyperparameters under temporal distribution shift, the tree regressors outperform the other algorithms with reasonable accuracy. However, the precision depends on the ML training algorithm, dataset, and protected attributes. For example, the tree regressor model was robust for training data shift from 2014 to 2018 on logistic regression and discriminant analysis HPs with sex as the protected attribute; but not for race and other training algorithms. Our method provides a sound framework to efficiently perform fine-tuning of ML training algorithms and understand the relationships between HPs and fairness.

References

[1]
Aniya Aggarwal, Pranay Lohia, Seema Nagar, Kuntal Dey, and Diptikalyan Saha. 2019. Black Box Fairness Testing of Machine Learning Models. ESEC/FSE 2019. 625–635. https://doi.org/10.1145/3338906.3338937
[2]
Andrea Arcuri and Lionel Briand. 2014. A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Software Testing, Verification and Reliability, 219–250. https://doi.org/10.1002/stvr.1486
[3]
Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, and Aleksandra Mojsilović. 2019. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63, 4/5 (2019), 4–1.
[4]
Candice Bentéjac, Anna Csörgő, and Gonzalo Martínez-Muñoz. 2021. A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54 (2021), 1937–1967.
[5]
Sumon Biswas and Hridesh Rajan. 2021. Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline. ESEC/FSE 2021. Association for Computing Machinery, New York, NY, USA. 981–993. isbn:9781450385626 https://doi.org/10.1145/3468264.3468536
[6]
Ruth G Blumrosen. 1978. Wage discrimination, job segregation, and the title vii of the civil rights act of 1964. U. Mich. JL Reform, 12 (1978), 397.
[7]
Joymallya Chakraborty, Suvodeep Majumder, and Tim Menzies. 2021. Bias in machine learning software: Why? how? what to do? 429–440.
[8]
Joymallya Chakraborty, Suvodeep Majumder, Zhe Yu, and Tim Menzies. 2020. Fairway: A Way to Build Fair ML Software. ESEC/FSE 2020. Association for Computing Machinery, New York, NY, USA. 654–665. isbn:9781450370431 https://doi.org/10.1145/3368089.3409697
[9]
Joymallya Chakraborty, Tianpei Xia, Fahmid M Fahid, and Tim Menzies. 2019. Software engineering for fairness: A case study with hyperparameter optimization. arXiv preprint arXiv:1905.05786.
[10]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.
[11]
Harris Drucker, Christopher J Burges, Linda Kaufman, Alex Smola, and Vladimir Vapnik. 1996. Support vector regression machines. Advances in neural information processing systems, 9 (1996).
[12]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/census+income
[13]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
[14]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/bank+marketing
[15]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.
[16]
Ming Fan, Wenying Wei, Wuxia Jin, Zijiang Yang, and Ting Liu. 2022. Explanation-Guided Fairness Testing through Genetic Algorithm. ICSE ’22. Association for Computing Machinery, New York, NY, USA. 871–882. isbn:9781450392211 https://doi.org/10.1145/3510003.3510137
[17]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189–1232.
[18]
Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness Testing: Testing Software for Discrimination. ESEC/FSE 2017. Association for Computing Machinery, New York, NY, USA. isbn:9781450351058 https://doi.org/10.1145/3106237.3106277
[19]
Usman Gohar, Sumon Biswas, and Hridesh Rajan. 2023. Towards Understanding Fairness and Its Composition in Ensemble Machine Learning. ICSE ’23. IEEE Press, 1533–1545. isbn:9781665457019 https://doi.org/10.1109/ICSE48619.2023.00133
[20]
Home Credit Group. 2018. Home Credit Default Risk. https://www.kaggle.com/competitions/home-credit-default-risk/overview
[21]
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In NIPS.
[22]
David Ingold and Spencer Soper. 2016. Amazon Doesn’t Consider the Race of Its Customers. Should It? https://www.bloomberg.com/graphics/2016-amazon-same-day/ Online
[23]
Surya Mattu Julia Angwin, Jeff Larson and Lauren Kirchne. 2021. Machine Bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing Online
[24]
Arjun Kharpal. 2018. TECH Health care start-up says A.I. can diagnose patients better than humans can, doctors call that ‘dubious’. CNBC, https://www.cnbc.com/2018/06/28/babylon-claims-its-ai-can-diagnose-patients-better-than-doctors.html
[25]
Max Kuhn and Kjell Johnson. 2013. Applied predictive modeling. 26, Springer.
[26]
Meelis Kull and Peter Flach. 2014. Patterns of dataset shift. In First international workshop on learning over multiple contexts (LMCE) at ECML-PKDD. 5.
[27]
Wei-Yin Loh. 2011. Classification and regression trees. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1, 1 (2011), 14–23.
[28]
Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, and Nigel Duffy. 2019. Evolving deep neural networks. In Artificial intelligence in the age of neural networks and brain computing. Elsevier, 293–312.
[29]
Verya Monjezi, Ashutosh Trivedi, Gang Tan, and Saeid Tizpaz-Niari. 2023. Information-theoretic testing and debugging of fairness defects in deep neural networks. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1571–1582.
[30]
Tho M Nguyen, Saeid Tizpaz-Niari, and Vladik Kreinovich. 2024. How to Make Machine Learning Financial Recommendations More Fair: Theoretical Explanation of Empirical Results.
[31]
ProPublica. 2021. Compas Software Ananlysis. https://github.com/propublica/compas-analysis Online
[32]
Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V Le, and Alexey Kurakin. 2017. Large-scale evolution of image classifiers. In International Conference on Machine Learning. 2902–2911.
[33]
scikit learn. 2021. Decision Tree Classifier. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html Online
[34]
scikit learn. 2021. Discriminant Analysis. https://scikit-learn.org/stable/modules/lda_qda.html Online
[35]
scikit learn. 2021. Logistic Regression. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html Online
[36]
scikit learn. 2021. Support Vector Machine. https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html Online
[37]
Kenneth O Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evolutionary computation, 10, 2 (2002), 99–127.
[38]
Jim Tankersley. 2023. Black Americans Are Much More Likely to Face Tax Audits, Study Finds. The New York Times, 31 Jan, https://www.nytimes.com/2023/01/31/us/politics/black-americans-irs-tax-audits.html
[39]
Saeid Tizpaz-Niari, Olga Kosheleva, and Vladik Kreinovich. 2024. How to Gauge Inequality and Fairness: A Complete Description of All Decomposable Versions of Theil Index.
[40]
Saeid Tizpaz-Niari, Ashish Kumar, Gang Tan, and Ashutosh Trivedi. 2022. Fairness-aware configuration of machine learning libraries. In Proceedings of the 44th International Conference on Software Engineering. 909–920.
[41]
Saeid Tizpaz-Niari, Verya Monjezi, Morgan Wagner, Shiva Darian, Krystia Reed, and Ashutosh Trivedi. 2023. Metamorphic Testing and Debugging of Tax Preparation Software. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS). 138–149. https://doi.org/10.1109/ICSE-SEIS58686.2023.00019
[42]
Saeid Tizpaz-Niari, Pavol Černý, and Ashutosh Trivedi. 2020. Detecting and understanding real-world differential performance bugs in machine learning libraries. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2020). 189–199. https://doi.org/10.1145/3395363.3404540
[43]
Kush R Varshney. 2021. Trustworthy machine learning. Chappaqua, NY.
[44]
Lingxi Xie and Alan Yuille. 2017. Genetic cnn. In Proceedings of the IEEE international conference on computer vision. 1379–1388.
[45]
Jie M Zhang and Mark Harman. 2021. "Ignorance and Prejudice" in Software Fairness. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1436–1447.
[46]
Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering, 48, 1 (2020), 1–36.
[47]
Peixin Zhang, Jingyi Wang, Jun Sun, Guoliang Dong, Xinyu Wang, Xingen Wang, Jin Song Dong, and Ting Dai. 2020. White-box fairness testing through adversarial sampling. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 949–960.

Index Terms

  1. Predicting Fairness of ML Software Configurations

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PROMISE 2024: Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering
    July 2024
    65 pages
    ISBN:9798400706752
    DOI:10.1145/3663533
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Fairness
    2. ML Hyperparameters
    3. distribution shifts

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation

    Conference

    PROMISE '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 98 of 213 submissions, 46%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 71
      Total Downloads
    • Downloads (Last 12 months)71
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 24 Oct 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media