skip to main content
10.1145/3568294.3580147acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
short-paper
Open access

Transparent Value Alignment

Published: 13 March 2023 Publication History

Abstract

As robots become increasingly prevalent in our communities, aligning the values motivating their behavior with human values is critical. However, it is often difficult or impossible for humans, both expert and non-expert, to enumerate values comprehensively, accurately, and in forms that are readily usable for robot planning. Misspecification can lead to undesired, inefficient, or even dangerous behavior. In the value alignment problem, humans and robots work together to optimize human objectives, which are often represented as reward functions and which the robot can infer by observing human actions. In existing alignment approaches, no explicit feedback about this inference process is provided to the human. In this paper, we introduce an exploratory framework to address this problem, which we call Transparent Value Alignment (TVA). TVA suggests that techniques from explainable AI (XAI) be explicitly applied to provide humans with information about the robot's beliefs throughout learning, enabling efficient and effective human feedback.

Supplementary Material

MP4 File (HRI23-lbr1201.mp4)
This video introduces the exploratory Transparent Value Alignment (TVA) framework. TVA suggests the application of techniques from explainable AI (XAI) to the value alignment setting in order to make value learning more efficient and effective by providing direct and explicit feedback to humans about what the agent has learned throughout the learning process.

References

[1]
Ofra Amir, Finale Doshi-Velez, and David Sarne. 2018. Agent strategy summarization. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 1203--1207.
[2]
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016).
[3]
Andrew Anderson et al. 2019. Explaining reinforcement learning to mere mortals: An empirical study. arXiv preprint arXiv:1903.09708 (2019).
[4]
Brenna D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robotics and autonomous systems, Vol. 57, 5 (2009), 469--483.
[5]
Erdem Biyik, Malayandi Palan, Nicholas C Landolfi, Dylan P Losey, and Dorsa Sadigh. 2019. Asking easy questions: A user-friendly approach to active reward learning. arXiv preprint arXiv:1910.04365 (2019).
[6]
Andreea Bobu, Andrea Bajcsy, Jaime F Fisac, and Anca D Dragan. 2018. Learning under misspecified objective spaces. In Conference on Robot Learning. PMLR, 796--805.
[7]
Andreea Bobu, Marius Wiggert, Claire Tomlin, and Anca D Dragan. 2021. Feature Expansive Reward Learning: Rethinking Human Input. In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction. 216--224.
[8]
Serena Booth, Sanjana Sharma, Sarah Chung, Julie Shah, and Elena L Glassman. 2022. Revisiting human-robot teaching and learning through the lens of human concept learning. In 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 147--156.
[9]
Daniel S Brown and Scott Niekum. 2019. Machine teaching for inverse reinforcement learning: Algorithms and applications. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7749--7758.
[10]
Daniel S Brown, Jordan Schneider, Anca Dragan, and Scott Niekum. 2021. Value alignment verification. In International Conference on Machine Learning. PMLR, 1105--1115.
[11]
Jessie YC Chen, Stephanie A Quinn, Julia L Wright, and Michael J Barnes. 2013. Effects of individual differences on human-agent teaming for multi-robot control. In International Conference on Engineering Psychology and Cognitive Ergonomics. Springer, 273--280.
[12]
Yuchen Cui and Scott Niekum. 2018. Active reward learning from critiques. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 6907--6914.
[13]
Devleena Das, Siddhartha Banerjee, and Sonia Chernova. 2021. Explainable AI for robot failures: Generating explanations that improve user assistance in fault recovery. In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction. 351--360.
[14]
Anca D Dragan, Kenton CT Lee, and Siddhartha S Srinivasa. 2013. Legibility and predictability of robot motion. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 301--308.
[15]
Upol Ehsan, Brent Harrison, Larry Chan, and Mark O Riedl. 2018. Rationalization: A neural machine translation approach to generating natural language explanations. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 81--87.
[16]
Jaime F Fisac, Monica A Gates, Jessica B Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S Shankar Sastry, Thomas L Griffiths, and Anca D Dragan. 2020. Pragmatic-pedagogic value alignment. In Robotics Research. Springer, 49--57.
[17]
Batya Friedman. 1996. Value-sensitive design. interactions, Vol. 3, 6 (1996), 16--23.
[18]
Kevin A Gluck and John E Laird. 2018. Interactive task learning: Humans, robots, and agents acquiring new tasks through natural interactions. 169--191.
[19]
David Gunning and David W Aha. 2019. DARPA's Explainable Artificial Intelligence Program. AI Magazine, Vol. 40, 2 (2019), 44--58.
[20]
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell. 2016. Cooperative inverse reinforcement learning. arXiv preprint arXiv:1606.03137 (2016).
[21]
Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, and Anca Dragan. 2017. Inverse reward design. arXiv preprint arXiv:1711.02827 (2017).
[22]
Sandy H Huang, David Held, Pieter Abbeel, and Anca D Dragan. 2019. Enabling robots to communicate their objectives. Autonomous Robots, Vol. 43, 2 (2019), 309--326.
[23]
Zoe Juozapaitis et al. 2019. Explainable reinforcement learning via reward decomposition. In IJCAI/ECAI Workshop on Explainable Artificial Intelligence.
[24]
Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, and Lin Guan. 2022. Symbols as a lingua franca for bridging human-ai chasm for explainable and advisable ai systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 12262--12267.
[25]
Been Kim, Cynthia Rudin, and Julie A Shah. 2014. The bayesian case model: A generative approach for case-based reasoning and prototype classification. Advances in neural information processing systems, Vol. 27 (2014).
[26]
Isaac Lage and Finale Doshi-Velez. 2020. Human-in-the-Loop Learning of Interpretable and Intuitive Representations. In ICML Workshop on Human Interpretability in Machine Learning, Vienna, Austria.
[27]
Isaac Lage, Daphna Lifschitz, Finale Doshi-Velez, and Ofra Amir. 2019. Exploring computational user models for agent policy summarization. arXiv preprint arXiv:1905.13271 (2019).
[28]
Michael S Lee, Henny Admoni, and Reid Simmons. 2022. Reasoning about Counterfactuals to Improve Human Inverse Reinforcement Learning. arXiv preprint arXiv:2203.01855 (2022).
[29]
Sören Mindermann, Rohin Shah, Adam Gleave, and Dylan Hadfield-Menell. 2018. Active inverse reward design. arXiv preprint arXiv:1809.03060 (2018).
[30]
Michael J Muller and Sarah Kuhn. 1993. Participatory design. Commun. ACM, Vol. 36, 6 (1993), 24--28.
[31]
Andrew Y Ng, Stuart J Russell, et al. 2000. Algorithms for inverse reinforcement learning. In Icml, Vol. 1. 2.
[32]
Alexander Pan, Kush Bhatia, and Jacob Steinhardt. 2022. The effects of reward misspecification: Mapping and mitigating misaligned models. arXiv preprint arXiv:2201.03544 (2022).
[33]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.
[34]
Lindsay Sanneman and Julie A Shah. 2022a. An empirical study of reward explanations with human-robot interaction applications. IEEE Robotics and Automation Letters, Vol. 7, 4 (2022), 8956--8963.
[35]
Lindsay Sanneman and Julie A Shah. 2022b. The Situation Awareness Framework for Explainable AI (SAFE-AI) and Human Factors Considerations for XAI Systems. International Journal of Human-Computer Interaction, Vol. 38, 18--20 (2022), 1772--1788.
[36]
Burr Settles. 2009. Active learning literature survey. (2009).
[37]
Sarath Sreedharan et al. 2020. Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Black Box Simulators. arXiv preprint arXiv:2002.01080 (2020).
[38]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[39]
Aaquib Tabrez, Shivendra Agrawal, and Bradley Hayes. 2019. Explanation-based reward coaching to improve human performance via reinforcement learning. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 249--257.

Cited By

View all
  • (2024)Toward Explainable Artificial Intelligence for Precision PathologyAnnual Review of Pathology: Mechanisms of Disease10.1146/annurev-pathmechdis-051222-11314719:1(541-570)Online publication date: 24-Jan-2024
  • (2024)FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System to Assist Human Labelers’ Preference ElicitationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645145(344-369)Online publication date: 18-Mar-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HRI '23: Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction
March 2023
612 pages
ISBN:9781450399708
DOI:10.1145/3568294
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2023

Check for updates

Author Tags

  1. explainable ai
  2. transparency
  3. value alignment

Qualifiers

  • Short-paper

Funding Sources

  • Army Research Laboratory

Conference

HRI '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)287
  • Downloads (Last 6 weeks)24
Reflects downloads up to 24 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Toward Explainable Artificial Intelligence for Precision PathologyAnnual Review of Pathology: Mechanisms of Disease10.1146/annurev-pathmechdis-051222-11314719:1(541-570)Online publication date: 24-Jan-2024
  • (2024)FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System to Assist Human Labelers’ Preference ElicitationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645145(344-369)Online publication date: 18-Mar-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media