skip to main content
10.1145/3581641.3584092acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Open access

Human-Centered Deferred Inference: Measuring User Interactions and Setting Deferral Criteria for Human-AI Teams

Published: 27 March 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Although deep learning holds the promise of novel and impactful interfaces, realizing such promise in practice remains a challenge: since dataset-driven deep-learned models assume a one-time human input, there is no recourse when they do not understand the input provided by the user. Works that address this via deferred inference—soliciting additional human input when uncertain—show meaningful improvement, but ignore key aspects of how users and models interact. In this work, we focus on the role of users in deferred inference and argue that the deferral criteria should be a function of the user and model as a team, not simply the model itself. In support of this, we introduce a novel mathematical formulation, validate it via an experiment analyzing the interactions of 25 individuals with a deep learning-based visiolinguistic model, and identify user-specific dependencies that are under-explored in prior work. We conclude by demonstrating two human-centered procedures for setting deferral criteria that are simple to implement, applicable to a wide variety of tasks, and perform equal to or better than equivalent procedures that use much larger datasets.

    Supplementary Material

    ZIP File (iui2023-60-supp.zip)
    Relevant supplemental data for the requested paper: the full list of images used in our experiment and corresponding initial queries and deferral responses (the source data for Table 1)

    References

    [1]
    [1] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In Proceedings of the 2015 IEEE International Conference on Computer Vision. IEEE Press, Santiago, Chile, 2425–2433.
    [2]
    [2] Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S. Lasecki, Daniel S. Weld, and Eric Horvitz. 2019. Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance. In Proceedings of the 2019 AAAI Conference on Human Computation and Crowdsourcing. AAAI Press, Orlando, Florida, USA, 2–11.
    [3]
    [3] Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S. Weld, Walter S. Lasecki, and Eric Horvitz. 2019. Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff. In Proceedings of the 2019 AAAI Conference on Artificial Intelligence. AAAI Press, Honolulu, Hawaii, USA, 2429–2437.
    [4]
    [4] Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel S. Weld. 2021. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM Press, Yokohama, Japan, 1–16.
    [5]
    [5] Giuseppe Bevacqua, Jonathan Cacace, Alberto Finzi, and Vincenzo Lippiello. 2015. Mixed-Initiative Planning and Execution for Multiple Drones in Search and Rescue Missions. In Proceedings of the 2015 International Conference on Automated Planning and Scheduling. AAAI Press, Jerusalem, Israel, 315–323.
    [6]
    [6] Nilava Bhattacharya, Qing Li, and Danna Gurari. 2019. Why Does a Visual Question Have Different Answers?. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. IEEE Press, Seoul, South Korea, 4270–4279.
    [7]
    [7] Jeffrey P Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. 2010. VizWiz: nearly real-time answers to visual questions. In Proceedings of the 2010 Annual ACM Symposium on User Interface Software and Technology. ACM Press, New York, New York, USA, 333–342.
    [8]
    [8] Jeffrey P. Bigham, Chandrika Jayant, Andrew Miller, Brandyn White, and Tom Yeh. 2010. VizWiz::LocateIt - enabling blind people to locate objects in their environment. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops. IEEE Press, San Francisco, California, USA, 65–72.
    [9]
    [9] Elizabeth Bondi, Raphael Koster, Hannah Sheahan, Martin Chadwick, Yoram Bachrach, Taylan Cemgil, Ulrich Paquet, and Krishnamurthy Dvijotham. 2022. Role of Human-AI Interaction in Selective Prediction. In Proceedings of the 2022 AAAI Conference on Artificial Intelligence. AAAI Press, Virtual, 5286–5294.
    [10]
    [10] J. Cacace, A. Finzi, V. Lippiello, M. Furci, N. Mimmo, and L. Marconi. 2016. A control architecture for multiple drones operated via multimodal interaction in search & rescue mission. In Proceedings of the 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics. IEEE Press, Lausanne, Switzerland, 233–239.
    [11]
    [11] Remi Cadene and Corentin Dancette. 2019. RUBi: Reducing Unimodal Biases for Visual Question Answering. In Proceedings of the 2019 Conference on Advances in Neural Information Processing Systems. Curran Associates, Vancouver, British Columbia, Canada, 839–850.
    [12]
    [12] Carrie J Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S Corrado, Martin C Stumpe, and Michael Terry. 2019. Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM Press, Glasgow, Scotland, UK, 14.
    [13]
    [13] Felix Carros, Johanna Meurer, Diana Löffler, David Unbehaun, Sarah Matthies, Inga Koch, Rainer Wieching, Dave Randall, Marc Hassenzahl, and Volker Wulf. 2020. Exploring Human-Robot Interaction with the Elderly: Results from a Ten-Week Case Study in a Care Home. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM Press, Honolulu, HawaiI, USA, 1–12.
    [14]
    [14] Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. In Proceedings of the 2015 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, Sydney, New South Wales, Australia, 1721–1730.
    [15]
    [15] Hao Cen, Kenneth Koedinger, and Brian Junker. 2006. Learning Factors Analysis – A General Method for Cognitive Model Evaluation and Improvement. In Proceedings of the 2006 International Conference on Intelligent Tutoring Systems. Springer, Jhongli, Taiwan, 164–175.
    [16]
    [16] Minsuk Chang, Mina Huh, and Juho Kim. 2021. RubySlippers: Supporting Content-based Voice Navigation for How-to Videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM Press, Yokohama, Japan, 97:1–97:14.
    [17]
    [17] Minsuk Chang, Anh Truong, Oliver Wang, Maneesh Agrawala, and Juho Kim. 2019. How to Design Voice Based Navigation for How-To Videos. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM Press, Glasgow, Scotland, UK, 701–712.
    [18]
    [18] Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. UNITER: UNiversal Image-TExt Representation Learning. In Proceedings of the 2020 European Conference on Computer Vision. Springer, Virtual, 104–120.
    [19]
    [19] C. Chow. 1970. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory 16, 1 (Jan. 1970), 41–46.
    [20]
    [20] Corinna Cortes, Giulia DeSalvo, and Mehryar Mohri. 2016. Boosting with Abstention. In Proceedings of the 2016 Conference on Advances in Neural Information Processing Systems. Curran Associates, Barcelona, Spain, 1660–1668.
    [21]
    [21] Benjamin R. Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. "What can I help you with?": infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 2017 International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM Press, Vienna Austria, 43:1–43:12.
    [22]
    [22] Martin Danelljan, Luc Van Gool, and Radu Timofte. 2020. Probabilistic Regression for Visual Tracking. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE PRess, Virtual, 7183–7192.
    [23]
    [23] Giorgio Fumera and Fabio Roli. 2002. Support Vector Machines with Embedded Reject Option. In Proceedings of the 2002 Pattern Recognition with Support Vector Machines Workshop. Springer Berlin Heidelberg, Niagara Falls, Ontario, Canada, 68–82.
    [24]
    [24] Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 2016 International Conference on Machine Learning. PMLR, New York, New York, USA, 1050–1059.
    [25]
    [25] Madan Ravi Ganesh, Jason J. Corso, and Salimeh Yasaei Sekeh. 2021. MINT: Deep Network Compression via Mutual Information-based Neuron Trimming. In Proceedings of the 2020 International Conference on Pattern Recognition. Springer, Virtual, 8251–8258.
    [26]
    [26] Olivier Gascuel and Gilles Caraux. 1992. Distribution-free performance bounds with the resubstitution error estimate. Pattern Recognition Letters 13, 11 (Nov. 1992), 757–764.
    [27]
    [27] Yonatan Geifman and Ran El-Yaniv. 2017. Selective Classification for Deep Neural Networks. In Proceedings of the 2017 Conference on Advances in Neural Information Processing Systems. Curran Associates, Long Beach, California, USA, 4878–4887.
    [28]
    [28] Yonatan Geifman and Ran El-Yaniv. 2019. SelectiveNet: A Deep Neural Network with an Integrated Reject Option. In Proceedings of the 2019 International Conference on Machine Learning. ACM Press, Long Beach, California, USA, 2151–2159.
    [29]
    [29] Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On Calibration of Modern Neural Networks. In Proceedings of the 2017 International Conference on Machine Learning. PMLR, Sydney, New South Wales, Australia, 1321–1330.
    [30]
    [30] Danna Gurari and Kristen Grauman. 2017. CrowdVerge: Predicting If People Will Agree on the Answer to a Visual Question. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM Press, Denver, Colorado, USA, 3511–3522.
    [31]
    [31] Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, and Jeffrey P. Bigham. 2018. VizWiz Grand Challenge: Answering Visual Questions from Blind People. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Press, Salt Lake City, Utah, USA, 3608–3617.
    [32]
    [32] Mohammad Haghighat and Masoud Amirkabiri Razian. 2014. Fast-FMI: Non-reference image fusion metric. In Proceedings of the 2014 IEEE International Conference on Application of Information and Communication Technologies. IEEE Press, Paris, France, 1–3.
    [33]
    [33] Ahmed Hassan Awadallah, Ranjitha Gurunath Kulkarni, Umut Ozertem, and Rosie Jones. 2015. Characterizing and Predicting Voice Query Reformulation. In Proceedings of the 2015 ACM International on Conference on Information and Knowledge Management. ACM Press, Melbourne, Victoria, Australia, 543–552.
    [34]
    [34] Jun Hatori, Yuta Kikuchi, Sosuke Kobayashi, Kuniyuki Takahashi, Yuta Tsuboi, Yuya Unno, Wilson Ko, and Jethro Tan. 2018. Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation. IEEE Press, Brisbane, Queensland, Australia, 3774–3781.
    [35]
    [35] Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, and Anna Rohrbach. 2018. Women Also Snowboard: Overcoming Bias in Captioning Models. In Proceedings of the 2018 European Conference on Computer VIsion. Springer International Publishing, Munich, Germany, 793–811.
    [36]
    [36] Jennifer Hill, W. Randolph Ford, and Ingrid G. Farreras. 2015. Real conversations with artificial intelligence: A comparison between human–human online conversations and human–chatbot conversations. Computers in Human Behavior 49 (Aug. 2015), 245–250.
    [37]
    [37] Mohit Jain, Ramachandra Kota, Pratyush Kumar, and Shwetak N. Patel. 2018. Convey: Exploring the Use of a Context View for Chatbots. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM Press, Montreal, Quebec, Canada, 1–6.
    [38]
    [38] Suyog Dutt Jain and Kristen Grauman. 2016. Click Carving: Segmenting Objects in Video with Point Clicks. In Proceedings of the 2016 AAAI Conference on Human Computation and Crowdsourcing. AAAI Press, Austin, Texas, USA, 89–98.
    [39]
    [39] Sahar Kazemzadeh, Vicente Ordonez, Mark Matten, and Tamara Berg. 2014. ReferItGame: Referring to Objects in Photographs of Natural Scenes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Doha, Qatar, 787–798.
    [40]
    [40] Jong-Wook Kim, Young-Lim Choi, Sang-Hyun Jeong, and Jeonghye Han. 2022. A Care Robot with Ethical Sensing System for Older Adults at Home. Sensors 22, 19 (Oct. 2022), 7515.
    [41]
    [41] Benjamin Kompa, Jasper Snoek, and Andrew L. Beam. 2021. Second opinion needed: communicating uncertainty in medical machine learning. npj Digital Medicine 4, 1 (Dec. 2021), 4.
    [42]
    [42] Matej Kristan, Jiri Matas, Aleš Leonardis, Tomáš Vojíř, Roman Pflugfelder, Gustavo Fernández, Georg Nebehay, Fatih Porikli, and Luka Čehovin. 2016. A Novel Performance Evaluation Methodology for Single-Target Trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 11 (Nov. 2016), 2137–2155.
    [43]
    [43] Vivian Lai, Samuel Carton, Rajat Bhatnagar, Q Vera Liao, Yunfeng Zhang, and Chenhao Tan. 2022. Human-AI Collaboration via Conditional Delegation: A Case Study of Content Moderation. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. ACM Press, New Orleans, Louisiana, USA, 54:1–54:18.
    [44]
    [44] Jaewook Lee, Jaylin Herskovitz, Yi-Hao Peng, and Anhong Guo. 2022. ImageExplorer: Multi-Layered Touch Exploration to Encourage Skepticism Towards Imperfect AI-Generated Image Captions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. ACM Press, New Orleans, Louisiana, USA, 462:1–462:15.
    [45]
    [45] Christian Leibig, Vaneeda Allken, Murat Seçkin Ayhan, Philipp Berens, and Siegfried Wahl. 2017. Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports 7, 1 (Dec. 2017), 1–14.
    [46]
    [46] Stephan J Lemmer and Jason J Corso. 2021. Ground-Truth or DAER: Selective Re-Query of Secondary Information. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. IEEE Press, Virtual, 703–714.
    [47]
    [47] Stephan J. Lemmer and Jason J. Corso. 2023. Evaluating and Improving Interactions with Hazy Oracles. In Proceedings of the 2023 AAAI Conference on Artificial Intelligence. AAAI Press, Washington, District of Columbia, USA, 9.
    [48]
    [48] Stephan J. Lemmer, Jean Y. Song, and Jason J. Corso. 2021. Crowdsourcing More Effective Initializations for Single-Target Trackers Through Automatic Re-querying. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM Press, Virtual, 391:1–391:13.
    [49]
    [49] Qing Li, Qingyi Tao, Shafiq Joty, Jianfei Cai, and Jiebo Luo. 2018. VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions. In Proceedings of the 2018 European Conference on Computer Vision. Springer, Munich, Germany, 570–586.
    [50]
    [50] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2014. Microsoft COCO: Common Objects in Context. In Proceedings of the 2014 European Conference on Computer Vision. Springer, Zurich, Switzerland, 740–755.
    [51]
    [51] Brian Lucena. 2018. Spline-Based Probability Calibration.
    [52]
    [52] Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM Press, San Jose, California, USA, 5286–5297.
    [53]
    [53] Haley MacLeod, Cynthia L. Bennett, Meredith Ringel Morris, and Edward Cutrell. 2017. Understanding Blind People’s Experiences with Computer-Generated Captions of Social Media Images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM Press, Denver, Colorado, USA, 5988–5999.
    [54]
    [54] Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, and Stefan Lee. 2017. The Promise of Premise: Harnessing Question Premises in Visual Question Answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 926–935.
    [55]
    [55] Varun Manjunatha, Nirat Saini, and Larry S. Davis. 2019. Explicit Bias Discovery in Visual Question Answering Models. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE PRess, Long Beach, California, USA, 9554–9563.
    [56]
    [56] Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, and Kevin Murphy. 2016. Generation and Comprehension of Unambiguous Object Descriptions. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Press, Las Vegas, Nevada, USA, 11–20.
    [57]
    [57] Oier Mees and Wolfram Burgard. 2020. Composing Pick-and-Place Tasks By Grounding Language. In Proceedings of the 2020 International Symposium on Experimental Robotics. Springer, La Valletta, Malta, 491–501.
    [58]
    [58] Azadeh Sadat Mozafari, Hugo Siqueira Gomes, Wilson Leão, Steeven Janny, and Christian Gagné. 2018. Attended Temperature Scaling: A Practical Approach for Calibrating Deep Neural Networks.
    [59]
    [59] Caio Mucchiani, Pamela Cacchione, Michelle Johnson, Ross Mead, and Mark Yim. 2021. Deployment of a Socially Assistive Robot for Assessment of COVID-19 Symptoms and Exposure at an Elder Care Setting. In Proceedings of the 2021 IEEE International Conference on Robot & Human Interactive Communication. IEEE Press, Virtual, 1189–1195.
    [60]
    [60] An T. Nguyen, Aditya Kharosekar, Saumyaa Krishnan, Siddhesh Krishnan, Elizabeth Tate, Byron C. Wallace, and Matthew Lease. 2018. Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking. In Proceedings of the 2018 Annual ACM Symposium on User Interface Software and Technology. ACM Press, Berlin Germany, 189–199.
    [61]
    [61] Morteza Noshad, Yu Zeng, and Alfred O. Hero III. 2019. Scalable Mutual Information Estimation using Dependence Graphs. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE Press, Brighton, United Kingdom, 2962–2966.
    [62]
    [62] Daniel Nyga, Subhro Roy, Rohan Paul, Daehyung Park, Mihai Pomarlan, Michael Beetz, and Nicholas Roy. 2018. Grounding Robot Plans from Natural Language Instructions with Incomplete World Knowledge. In Proceedings of the 2018 Conference on Robot Learning. PMLR, Zurich, Switzerland, 714–723.
    [63]
    [63] Amelia Elizabeth Pollard and Jonathan L. Shapiro. 2020. Visual Question Answering as a Multi-Task Problem.
    [64]
    [64] Prakruthi Prabhakar, Nitish Kulkarni, and Linghao Zhang. 2018. Question Relevance in Visual Question Answering.
    [65]
    [65] Maithra Raghu, Katy Blumer, Rory Sayres, Ziad Obermeyer, Robert Kleinberg, Sendhil Mullainathan, and Jon Kleinberg. 2019. Direct Uncertainty Prediction for Medical Second Opinions. In Proceedings of the 2019 International Conference on Machine Learning. ACM Press, Long Beach, California, USA, 5281–5290.
    [66]
    [66] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, USA, 2383–2392.
    [67]
    [67] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 2016 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, San Francisco, California, USA, 1135–1144.
    [68]
    [68] Kelly Rivers, Erik Harpstead, and Ken Koedinger. 2016. Learning Curve Analysis for Programming: Which Concepts do Students Struggle With?. In Proceedings of the 2016 ACM Conference on International Computing Education Research. ACM Press, Melbourne, Victoria, Australia, 143–151.
    [69]
    [69] Lucas Rosenblatt, Patrick Carrington, Kotaro Hara, and Jeffrey P. Bigham. 2018. Vocal Programming for People with Upper-Body Motor Impairments. In Proceedings of the 2018 International Web for All Conference. ACM Press, Lyon, France, 30:1–30:10.
    [70]
    [70] Amir Rosenfeld, Richard Zemel, and John K. Tsotsos. 2018. The Elephant in the Room.
    [71]
    [71] Shumpei Sano, Nobuhiro Kaji, and Manabu Sassano. 2017. Predicting Causes of Reformulation in Intelligent Assistants. In Proceedings of the 2017 Annual SIGdial Meeting on Discourse and Dialogue. Association for Computational Linguistics, Saarbrücken, Germany, 299–309.
    [72]
    [72] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2020. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision 128, 2 (Feb. 2020), 336–359.
    [73]
    [73] Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, and Dieter Fox. 2022. Correcting Robot Plans with Natural Language Feedback. In Proceedings of the 2022 Conference on Robotics: Science and Systems. MIT Press, New York, New York, USA, 1–12.
    [74]
    [74] Mohit Shridhar and David Hsu. 2018. Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction. In Proceedings of Robotics: Science and Systems 2018. MIT Press, Pittsburgh, Pennsylvania, United States, 1–9.
    [75]
    [75] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In Proceedings of the 2014 International Conference on Learning Representations. OpenReview, Banff, Alberta, Canada, 10.
    [76]
    [76] Ryan Szeto and Jason J. Corso. 2017. Click Here: Human-Localized Keypoints as Guidance for Viewpoint Estimation. In Proceedings of the 2017 IEEE/CVF International Conference on Computer Vision. IEEE Press, Venice, Italy, 1604–1613.
    [77]
    [77] Jorge Sánchez, Mauricio Mazuecos, Hernán Maina, and Luciana Benotti. 2022. What kinds of errors do reference resolution models make and what can we learn from them?. In 2022 Findings of the Association for Computational Linguistics. ACL Press, Seattle, Washington, USA, 1971–1986.
    [78]
    [78] Jesse Thomason, Michael Murray, Maya Cakmak, and Luke Zettlemoyer. 2020. Vision-and-Dialog Navigation. In Proceedings of the 2020 Conference on Robot Learning. PMLR, Virtual, 394–406.
    [79]
    [79] Kohei Uehara, Nan Duan, and Tatsuya Harada. 2022. Learning To Ask Informative Sub-Questions for Visual Question Answering. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. IEEE Press, New Orleans, Louisiana, USA, 4681–4690.
    [80]
    [80] Jasper R. R. Uijlings, Mykhaylo Andriluka, and Vittorio Ferrari. 2020. Panoptic Image Annotation with a Collaborative Assistant. In Proceedings of the 2020 ACM International Conference on Multimedia. ACM Press, Virtual, 3302–3310.
    [81]
    [81] K. R. Varshney. 2011. A risk bound for ensemble classification with a reject option. In 2011 IEEE Statistical Signal Processing Workshop. IEEE Press, Nice, France, 769–772.
    [82]
    [82] David Widmann, Fredrik Lindsten, and Dave Zachariah. 2019. Calibration tests in multi-class classification: A unifying framework. In Proceedings of the 2019 Conference on Advances in Neural Information Processing Systems. Curran Associates, Vancouver, British Columbia, Canada, 12236–12246.
    [83]
    [83] Xuhai Xu, Jun Gong, Carolina Brum, Lilian Liang, Bongsoo Suh, Kumar Gupta, Yash Agarwal, Laurence Lindsey, Runchang Kang, Behrooz Shahsavari, Tu Nguyen, Heriberto Nieto, Scott E. Hudson, Charlie Maalouf, Seyed Mousavi, and Gierad Laput. 2022. Enabling hand gesture customization on wrist-worn devices. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. ACM Press, New Orleans, Lousiana, USA, 496:1–496:19.
    [84]
    [84] Takashi Yamamoto, Koji Terada, Akiyoshi Ochiai, Fuminori Saito, Yoshiaki Asahara, and Kazuto Murase. 2019. Development of Human Support Robot as the research platform of a domestic mobile manipulator. ROBOMECH Journal 6, 1 (Dec. 2019), 4.
    [85]
    [85] Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, and Olga Russakovsky. 2020. Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM Press, Barcelona Spain, 547–558.
    [86]
    [86] Jennifer Zamora. 2017. I’m Sorry, Dave, I’m Afraid I Can’t Do That: Chatbot Perception and Expectations. In Proceedings of the 2017 International Conference on Human Agent Interaction. ACM Press, Bielefeld, Germany, 253–260.
    [87]
    [87] Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM Press, Barcelona Spain, 295–305.
    [88]
    [88] Yaxi Zhao, Razan Jaber, Donald McMillan, and Cosmin Munteanu. 2022. "Rewind to the Jiggling Meat Part": Understanding Voice Control of Instructional Videos in Everyday Tasks. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. ACM Press, New Orleans LA USA, 58:1–58:11.

    Cited By

    View all
    • (2024)A Taxonomy for Human-LLM Interaction Modes: An Initial ExplorationExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650786(1-11)Online publication date: 11-May-2024

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces
    March 2023
    972 pages
    ISBN:9798400701061
    DOI:10.1145/3581641
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 March 2023

    Check for updates

    Author Tags

    1. deferred inference
    2. neural networks
    3. referring expression comprehension

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    IUI '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 746 of 2,811 submissions, 27%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)538
    • Downloads (Last 6 weeks)93
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Taxonomy for Human-LLM Interaction Modes: An Initial ExplorationExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650786(1-11)Online publication date: 11-May-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media