skip to main content
10.1145/3581641.3584058acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Open access

It Seems Smart, but It Acts Stupid: Development of Trust in AI Advice in a Repeated Legal Decision-Making Task

Published: 27 March 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Humans increasingly interact with AI systems, and successful interactions rely on individuals trusting such systems (when appropriate). Considering that trust is fragile and often cannot be restored quickly, we focus on how trust develops over time in a human-AI-interaction scenario. In a 2x2 between-subject experiment, we test how model accuracy (high vs. low) and type of explanation (human-like vs. not) affect trust in AI over time. We study a complex decision-making task in which individuals estimate jail time for 20 criminal law cases with AI advice. Results show that trust is significantly higher for high-accuracy models. Also, behavioral trust does not decline, and subjective trust even increases significantly with high accuracy. Human-like explanations did not generally affect trust but boosted trust in high-accuracy models.

    References

    [1]
    Naomi Aoki. 2021. The importance of the assurance that “humans are still in the decision loop” for public trust in artificial intelligence: Evidence from an online experiment. Computers in Human Behavior 114 (2021), 106572.
    [2]
    Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion 58(2020), 82–115.
    [3]
    Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, and Daniel S Weld. 2021. Is the most accurate ai the best teammate? optimizing ai for teamwork. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11405–11414.
    [4]
    Rachel Baumsteiger and Jason T Siegel. 2019. Measuring prosociality: The development of a prosocial behavioral intentions scale. Journal of personality assessment 101, 3 (2019), 305–314.
    [5]
    Matthias Beggiato and Josef F Krems. 2013. The evolution of mental model, trust and acceptance of adaptive cruise control in relation to initial information. Transportation research part F: traffic psychology and behaviour 18 (2013), 47–57.
    [6]
    Michaela Benk, Suzanne Tolmeijer, Florian von Wangenheim, and Andrea Ferrario. 2022. The Value of Measuring Trust in AI-A Socio-Technical System Perspective. arXiv preprint arXiv:2204.13480(2022).
    [7]
    Benedikt Berger, Martin Adam, Alexander Rühr, and Alexander Benlian. 2021. Watch me improve—algorithm aversion and demonstrating the ability to learn. Business & Information Systems Engineering 63, 1 (2021), 55–68.
    [8]
    Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1(2021), 1–21.
    [9]
    Christopher Burr, Nello Cristianini, and James Ladyman. 2018. An analysis of the interaction between intelligent software agents and human users. Minds and machines 28, 4 (2018), 735–774.
    [10]
    Christopher S Calhoun, Philip Bobko, Jennie J Gallimore, and Joseph B Lyons. 2019. Linking precursors of interpersonal trust to human-automation trust: An expanded typology and exploratory experiment. Journal of Trust Research 9, 1 (2019), 28–46.
    [11]
    Noah Castelo and Adrian F Ward. 2021. Conservatism predicts aversion to consequential Artificial Intelligence. Plos one 16, 12 (2021), e0261467.
    [12]
    Alvaro Chacon, Edgar E Kausel, and Tomas Reyes. 2022. A longitudinal approach for understanding algorithm use. Journal of Behavioral Decision Making(2022).
    [13]
    Chih-Yang Chao, Tsai-Chu Chang, Hui-Chun Wu, Yong-Shun Lin, and Po-Chen Chen. 2016. The interrelationship between intelligent agents’ characteristics and users’ intention in a search engine by making beliefs and perceived risks mediators. Computers in Human Behavior 64 (2016), 117–125.
    [14]
    Jessie YC Chen, Michael J Barnes, Anthony R Selkowitz, Kimberly Stowers, Shan G Lakhmani, and Nicholas Kasdaglis. 2016. Human-autonomy teaming and agent transparency. In Companion Publication of the 21st International Conference on Intelligent User Interfaces. 28–31.
    [15]
    Manolis Chiou, Faye McCabe, Markella Grigoriou, and Rustam Stolkin. 2021. Trust, shared understanding and locus of control in mixed-initiative robotic systems. In 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN). IEEE, 684–691.
    [16]
    Angèle Christin. 2017. Algorithms in practice: Comparing web journalism and criminal justice. Big Data & Society 4, 2 (2017), 2053951717718855.
    [17]
    de Rechtspaak. 2022. de Rechtspraak Website. https://www.rechtspraak.nl/
    [18]
    Dominik Dellermann, Philipp Ebel, Matthias Söllner, and Jan Marco Leimeister. 2019. Hybrid intelligence. Business & Information Systems Engineering 61, 5 (2019), 637–643.
    [19]
    Munjal Desai, Poornima Kaniarasu, Mikhail Medvedev, Aaron Steinfeld, and Holly Yanco. 2013. Impact of robot failures and feedback on real-time trust. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 251–258.
    [20]
    Berkeley J Dietvorst and Soaham Bharti. 2020. People reject algorithms in uncertain decision domains because they have diminishing sensitivity to forecasting error. Psychological science 31, 10 (2020), 1302–1314.
    [21]
    Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of Experimental Psychology: General 144, 1 (2015), 114.
    [22]
    Mary Dzindolet, Linda Pierce, Scott Peterson, Lori Purcell, Hall Beck, and Hall Beck. 2002. The influence of feedback on automation use, misuse, and disuse. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 46. SAGE Publications Sage CA: Los Angeles, CA, 551–555.
    [23]
    Connor Esterwood and Lionel P Robert. 2021. Do you still trust me? human-robot trust repair strategies. In 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN). IEEE, 183–188.
    [24]
    Md Abdullah Al Fahim, Mohammad Maifi Hasan Khan, Theodore Jensen, Yusuf Albayram, and Emil Coman. 2021. Do integral emotions affect trust? The mediating effect of emotions on trust in the context of human-agent interaction. In Designing Interactive Systems Conference 2021. 1492–1503.
    [25]
    Xiaocong Fan, Sooyoung Oh, Michael McNeese, John Yen, Haydee Cuevas, Laura Strater, and Mica R Endsley. 2008. The influence of agent reliability on trust in human-agent collaboration. In Proceedings of the 15th European conference on Cognitive ergonomics: the ergonomics of cool interaction. 1–8.
    [26]
    Franz Faul, Edgar Erdfelder, Axel Buchner, and Albert-Georg Lang. 2009. Statistical power analyses using G* Power 3.1: Tests for correlation and regression analyses. Behavior research methods 41, 4 (2009), 1149–1160.
    [27]
    Juliana Jansen Ferreira and Mateus Monteiro. 2021. The human-AI relationship in decision-making: AI explanation to support people on justifying their decisions. arXiv preprint arXiv:2102.05460(2021).
    [28]
    Susanne Gaube, Harini Suresh, Martina Raue, Alexander Merritt, Seth J Berkowitz, Eva Lermer, Joseph F Coughlin, John V Guttag, Errol Colak, and Marzyeh Ghassemi. 2021. Do as AI say: susceptibility in deployment of clinical decision-aids. NPJ digital medicine 4, 1 (2021), 1–8.
    [29]
    Ella Glikson and Anita Williams Woolley. 2020. Human trust in artificial intelligence: Review of empirical research. Academy of Management Annals 14, 2 (2020), 627–660.
    [30]
    William M Grove and Paul E Meehl. 1996. Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical–statistical controversy.Psychology, public policy, and law 2, 2 (1996), 293.
    [31]
    Nigel Harvey and Ilan Fischer. 1997. Taking advice: Accepting help, improving judgment, and sharing responsibility. Organizational behavior and human decision processes 70, 2 (1997), 117–133.
    [32]
    Daniel Holliday, Stephanie Wilson, and Simone Stumpf. 2016. User trust in intelligent systems: A journey over time. In Proceedings of the 21st international conference on intelligent user interfaces. 164–168.
    [33]
    Jiun-Yin Jian, Ann M Bisantz, and Colin G Drury. 2000. Foundations for an empirically determined scale of trust in automated systems. International journal of cognitive ergonomics 4, 1 (2000), 53–71.
    [34]
    Uday Kamath and John Liu. 2021. Explainable Artificial Intelligence: An Introduction to Interpretable Machine Learning. Springer.
    [35]
    Alexander John Karran, Théophile Demazure, Antoine Hudon, Sylvain Senecal, and Pierre-Majorique Léger. 2022. Designing for Confidence: The Impact of Visualizing Artificial Intelligence Decisions. Frontiers in Neuroscience 16 (2022).
    [36]
    Rabia Fatima Khan and Alistair Sutcliffe. 2014. Attractive agents are more persuasive. International Journal of Human-Computer Interaction 30, 2(2014), 142–150.
    [37]
    Taenyun Kim and Hayeon Song. 2021. How should intelligent agents apologize to restore trust? Interaction effects between anthropomorphism and apology attribution on trust repair. Telematics and Informatics 61 (2021), 101595.
    [38]
    Bart Knijnenburg and Martijn Willemsen. 2016. Inferring Capabilities of Intelligent Agents from Their External Traits. ACM Transactions on Interactive Intelligent Systems 6 (11 2016), 1–25. https://doi.org/10.1145/2963106
    [39]
    Bran Knowles and Vicki L. Hanson. 2018. The Wisdom of Older Technology (Non)Users. Commun. ACM 61, 3 (feb 2018), 72–77. https://doi.org/10.1145/3179995
    [40]
    Spencer C Kohn, Daniel Quinn, Richard Pak, Ewart J De Visser, and Tyler H Shaw. 2018. Trust repair strategies with self-driving vehicles: An exploratory study. In Proceedings of the human factors and ergonomics society annual meeting, Vol. 62. Sage Publications Sage CA: Los Angeles, CA, 1108–1112.
    [41]
    Moritz Körber, Eva Baseler, and Klaus Bengler. 2018. Introduction matters: Manipulating trust in automation and reliance in automated driving. Applied ergonomics 66(2018), 18–31.
    [42]
    Maier Fenster1and Inon Zuckerman2and Sarit Kraus. 2012. Guiding user choice during discussion by silence, examples and justifications. In ECAI 2012: 20th European Conference on Artificial Intelligence, Vol. 242. IOS Press, 330.
    [43]
    Samantha Krening and Karen M Feigh. 2018. Characteristics that influence perceived intelligence in AI design. In Proceedings of the human factors and ergonomics society annual meeting, Vol. 62. SAGE Publications Sage CA: Los Angeles, CA, 1637–1641.
    [44]
    John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 46, 1 (2004), 50–80.
    [45]
    Stephan Lewandowsky, Michael Mundy, and Gerard Tan. 2000. The dynamics of trust: comparing humans to automation.Journal of Experimental Psychology: Applied 6, 2 (2000), 104.
    [46]
    Brian Y Lim, Anind K Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Proceedings of the SIGCHI conference on human factors in computing systems. 2119–2128.
    [47]
    Brian Y Lim, Qian Yang, Ashraf M Abdul, and Danding Wang. 2019. Why these explanations? Selecting intelligibility types for explanation goals. In IUI Workshops.
    [48]
    Tyler J Loftus, Patrick J Tighe, Amanda C Filiberto, Philip A Efron, Scott C Brakenridge, Alicia M Mohr, Parisa Rashidi, Gilbert R Upchurch, and Azra Bihorac. 2020. Artificial intelligence and surgical decision-making. JAMA surgery 155, 2 (2020), 148–158.
    [49]
    Jennifer M Logg, Julia A Minson, and Don A Moore. 2019. Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes 151 (2019), 90–103.
    [50]
    JB Manchon, Mercedes Bueno, and Jordan Navarro. 2021. Calibration of Trust in Automated Driving: A Matter of Initial Level of Trust and Automated Driving Style?Human Factors (2021), 00187208211052804.
    [51]
    Dietrich Manzey, Juliane Reichenbach, and Linda Onnasch. 2012. Human performance consequences of automated decision aids: The impact of degree of automation and system experience. Journal of Cognitive Engineering and Decision Making 6, 1 (2012), 57–87.
    [52]
    Marieke Möhlmann and Lior Zalmanson. 2017. Hands on the wheel: Navigating algorithmic management and Uber drivers’. In Autonomy’, in proceedings of the international conference on information systems (ICIS), Seoul South Korea. 10–13.
    [53]
    Mahsan Nourani, Samia Kabir, Sina Mohseni, and Eric D Ragan. 2019. The effects of meaningful and meaningless explanations on trust and perceived system accuracy in intelligent systems. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 97–105.
    [54]
    Mahsan Nourani, Joanie King, and Eric Ragan. 2020. The role of domain expertise in user trust and the impact of first impressions with intelligent systems. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 8. 112–121.
    [55]
    Kazuo Okamura and Seiji Yamada. 2020. Adaptive trust calibration for human-AI collaboration. Plos one 15, 2 (2020), e0229132.
    [56]
    Atte Oksanen, Nina Savela, Rita Latikka, and Aki Koivula. 2020. Trust toward robots and artificial intelligence: An experimental approach to human–technology interactions online. Frontiers in Psychology 11 (2020), 568256.
    [57]
    Andrea Papenmeier, Gwenn Englebienne, and Christin Seifert. 2019. How model accuracy and explanation fidelity influence user trust. arXiv preprint arXiv:1907.12652(2019).
    [58]
    Andrea Papenmeier, Dagmar Kern, Gwenn Englebienne, and Christin Seifert. 2022. It’s Complicated: The Relationship between User Trust, Model Accuracy and Explanations in AI. ACM Transactions on Computer-Human Interaction (TOCHI) 29, 4(2022), 1–33.
    [59]
    Prolific.co. 2022. Prolific Research Platform. https://www.prolific.co/
    [60]
    Timothy M Rawson, Raheelah Ahmad, Christofer Toumazou, Pantelis Georgiou, and Alison H Holmes. 2019. Artificial intelligence can improve decision-making in infection management. Nature Human Behaviour 3, 6 (2019), 543–545.
    [61]
    Nicolas Scharowski, Sebastian AC Perrig, Nick von Felten, and Florian Brühlmann. 2022. Trust and Reliance in XAI–Distinguishing Between Attitudinal and Behavioral Measures. arXiv preprint arXiv:2203.12318(2022).
    [62]
    Navya Nishith Sharan and Daniela Maria Romano. 2020. The effects of personality and locus of control on trust in humans versus artificial intelligence. Heliyon 6, 8 (2020), e04572.
    [63]
    Donghee Shin. 2021. The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI. International Journal of Human-Computer Studies 146 (2021), 102551.
    [64]
    Donghee Shin, Bu Zhong, and Frank A Biocca. 2020. Beyond user experience: What constitutes algorithmic experiences?International Journal of Information Management 52 (2020), 102061.
    [65]
    Janet A. Sniezek and Lyn M. Van Swol. 2001. Trust, Confidence, and Expertise in a Judge-Advisor System. Organizational Behavior and Human Decision Processes 84, 2 (2001), 288–307. https://doi.org/10.1006/obhd.2000.2926
    [66]
    Andrea Tocchetti and Marco Brambilla. 2022. The Role of Human Knowledge in Explainable AI. Data 7, 7 (2022), 93.
    [67]
    Suzanne Tolmeijer, Ujwal Gadiraju, Ramya Ghantasala, Akshit Gupta, and Abraham Bernstein. 2021. Second chance for a first impression? Trust development in intelligent system interaction. In Proceedings of the 29th ACM Conference on user modeling, adaptation and personalization. 77–87.
    [68]
    Ning Wang, David V Pynadath, and Susan G Hill. 2016. Trust calibration within a human-robot team: Comparing automatically generated explanations. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 109–116.
    [69]
    Adrian Weller. 2019. Transparency: motivations and challenges. In Explainable AI: interpreting, explaining and visualizing deep learning. Springer, 23–40.
    [70]
    Daniel Wessel, Christiane Attig, and Thomas Franke. 2019. ATI-S-an Ultra-Short scale for assessing affinity for technology interaction in user studies. In Proceedings of Mensch und Computer 2019. 147–154.
    [71]
    X Jessie Yang, Christopher Schemanske, and Christine Searle. 2021. Toward quantifying trust dynamics: How people adjust their trust after moment-to-moment interaction with automation. arXiv preprint arXiv:2107.07374(2021).
    [72]
    Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–12.
    [73]
    Kun Yu, Shlomo Berkovsky, Ronnie Taib, Dan Conway, Jianlong Zhou, and Fang Chen. 2017. User trust dynamics: An investigation driven by differences in system performance. In Proceedings of the 22nd international conference on intelligent user interfaces. 307–317.

    Cited By

    View all
    • (2024)The Trust Recovery Journey. The Effect of Timing of Errors on the Willingness to Follow AI Advice.Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645167(609-622)Online publication date: 18-Mar-2024

    Index Terms

    1. It Seems Smart, but It Acts Stupid: Development of Trust in AI Advice in a Repeated Legal Decision-Making Task

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces
        March 2023
        972 pages
        ISBN:9798400701061
        DOI:10.1145/3581641
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 March 2023

        Check for updates

        Author Tags

        1. Collaborative Decision-Making
        2. Human-AI Interaction
        3. Trust Development
        4. Trustworthy AI

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        IUI '23
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 746 of 2,811 submissions, 27%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)900
        • Downloads (Last 6 weeks)80
        Reflects downloads up to 14 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)The Trust Recovery Journey. The Effect of Timing of Errors on the Willingness to Follow AI Advice.Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645167(609-622)Online publication date: 18-Mar-2024

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media