short-paper

Open access

Transparent Value Alignment

Authors:

Lindsay Sanneman,

Julie ShahAuthors Info & Claims

HRI '23: Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction

Pages 557 - 560

https://doi.org/10.1145/3568294.3580147

Published: 13 March 2023 Publication History

Abstract

As robots become increasingly prevalent in our communities, aligning the values motivating their behavior with human values is critical. However, it is often difficult or impossible for humans, both expert and non-expert, to enumerate values comprehensively, accurately, and in forms that are readily usable for robot planning. Misspecification can lead to undesired, inefficient, or even dangerous behavior. In the value alignment problem, humans and robots work together to optimize human objectives, which are often represented as reward functions and which the robot can infer by observing human actions. In existing alignment approaches, no explicit feedback about this inference process is provided to the human. In this paper, we introduce an exploratory framework to address this problem, which we call Transparent Value Alignment (TVA). TVA suggests that techniques from explainable AI (XAI) be explicitly applied to provide humans with information about the robot's beliefs throughout learning, enabling efficient and effective human feedback.

Supplementary Material

MP4 File (HRI23-lbr1201.mp4)

This video introduces the exploratory Transparent Value Alignment (TVA) framework. TVA suggests the application of techniques from explainable AI (XAI) to the value alignment setting in order to make value learning more efficient and effective by providing direct and explicit feedback to humans about what the agent has learned throughout the learning process.

Download
4.29 MB

References

[1]

Ofra Amir, Finale Doshi-Velez, and David Sarne. 2018. Agent strategy summarization. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 1203--1207.

[2]

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016).

[3]

Andrew Anderson et al. 2019. Explaining reinforcement learning to mere mortals: An empirical study. arXiv preprint arXiv:1903.09708 (2019).

[4]

Brenna D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robotics and autonomous systems, Vol. 57, 5 (2009), 469--483.

Digital Library

[5]

Erdem Biyik, Malayandi Palan, Nicholas C Landolfi, Dylan P Losey, and Dorsa Sadigh. 2019. Asking easy questions: A user-friendly approach to active reward learning. arXiv preprint arXiv:1910.04365 (2019).

[6]

Andreea Bobu, Andrea Bajcsy, Jaime F Fisac, and Anca D Dragan. 2018. Learning under misspecified objective spaces. In Conference on Robot Learning. PMLR, 796--805.

[7]

Andreea Bobu, Marius Wiggert, Claire Tomlin, and Anca D Dragan. 2021. Feature Expansive Reward Learning: Rethinking Human Input. In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction. 216--224.

Digital Library

[8]

Serena Booth, Sanjana Sharma, Sarah Chung, Julie Shah, and Elena L Glassman. 2022. Revisiting human-robot teaching and learning through the lens of human concept learning. In 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 147--156.

Digital Library

[9]

Daniel S Brown and Scott Niekum. 2019. Machine teaching for inverse reinforcement learning: Algorithms and applications. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7749--7758.

Digital Library

[10]

Daniel S Brown, Jordan Schneider, Anca Dragan, and Scott Niekum. 2021. Value alignment verification. In International Conference on Machine Learning. PMLR, 1105--1115.

[11]

Jessie YC Chen, Stephanie A Quinn, Julia L Wright, and Michael J Barnes. 2013. Effects of individual differences on human-agent teaming for multi-robot control. In International Conference on Engineering Psychology and Cognitive Ergonomics. Springer, 273--280.

[12]

Yuchen Cui and Scott Niekum. 2018. Active reward learning from critiques. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 6907--6914.

Digital Library

[13]

Devleena Das, Siddhartha Banerjee, and Sonia Chernova. 2021. Explainable AI for robot failures: Generating explanations that improve user assistance in fault recovery. In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction. 351--360.

Digital Library

[14]

Anca D Dragan, Kenton CT Lee, and Siddhartha S Srinivasa. 2013. Legibility and predictability of robot motion. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 301--308.

[15]

Upol Ehsan, Brent Harrison, Larry Chan, and Mark O Riedl. 2018. Rationalization: A neural machine translation approach to generating natural language explanations. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 81--87.

Digital Library

[16]

Jaime F Fisac, Monica A Gates, Jessica B Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S Shankar Sastry, Thomas L Griffiths, and Anca D Dragan. 2020. Pragmatic-pedagogic value alignment. In Robotics Research. Springer, 49--57.

[17]

Batya Friedman. 1996. Value-sensitive design. interactions, Vol. 3, 6 (1996), 16--23.

[18]

Kevin A Gluck and John E Laird. 2018. Interactive task learning: Humans, robots, and agents acquiring new tasks through natural interactions. 169--191.

[19]

David Gunning and David W Aha. 2019. DARPA's Explainable Artificial Intelligence Program. AI Magazine, Vol. 40, 2 (2019), 44--58.

Digital Library

[20]

Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell. 2016. Cooperative inverse reinforcement learning. arXiv preprint arXiv:1606.03137 (2016).

[21]

Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, and Anca Dragan. 2017. Inverse reward design. arXiv preprint arXiv:1711.02827 (2017).

[22]

Sandy H Huang, David Held, Pieter Abbeel, and Anca D Dragan. 2019. Enabling robots to communicate their objectives. Autonomous Robots, Vol. 43, 2 (2019), 309--326.

Digital Library

[23]

Zoe Juozapaitis et al. 2019. Explainable reinforcement learning via reward decomposition. In IJCAI/ECAI Workshop on Explainable Artificial Intelligence.

[24]

Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, and Lin Guan. 2022. Symbols as a lingua franca for bridging human-ai chasm for explainable and advisable ai systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 12262--12267.

[25]

Been Kim, Cynthia Rudin, and Julie A Shah. 2014. The bayesian case model: A generative approach for case-based reasoning and prototype classification. Advances in neural information processing systems, Vol. 27 (2014).

[26]

Isaac Lage and Finale Doshi-Velez. 2020. Human-in-the-Loop Learning of Interpretable and Intuitive Representations. In ICML Workshop on Human Interpretability in Machine Learning, Vienna, Austria.

[27]

Isaac Lage, Daphna Lifschitz, Finale Doshi-Velez, and Ofra Amir. 2019. Exploring computational user models for agent policy summarization. arXiv preprint arXiv:1905.13271 (2019).

[28]

Michael S Lee, Henny Admoni, and Reid Simmons. 2022. Reasoning about Counterfactuals to Improve Human Inverse Reinforcement Learning. arXiv preprint arXiv:2203.01855 (2022).

[29]

Sören Mindermann, Rohin Shah, Adam Gleave, and Dylan Hadfield-Menell. 2018. Active inverse reward design. arXiv preprint arXiv:1809.03060 (2018).

[30]

Michael J Muller and Sarah Kuhn. 1993. Participatory design. Commun. ACM, Vol. 36, 6 (1993), 24--28.

Digital Library

[31]

Andrew Y Ng, Stuart J Russell, et al. 2000. Algorithms for inverse reinforcement learning. In Icml, Vol. 1. 2.

[32]

Alexander Pan, Kush Bhatia, and Jacob Steinhardt. 2022. The effects of reward misspecification: Mapping and mitigating misaligned models. arXiv preprint arXiv:2201.03544 (2022).

[33]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.

Digital Library

[34]

Lindsay Sanneman and Julie A Shah. 2022a. An empirical study of reward explanations with human-robot interaction applications. IEEE Robotics and Automation Letters, Vol. 7, 4 (2022), 8956--8963.

[35]

Lindsay Sanneman and Julie A Shah. 2022b. The Situation Awareness Framework for Explainable AI (SAFE-AI) and Human Factors Considerations for XAI Systems. International Journal of Human-Computer Interaction, Vol. 38, 18--20 (2022), 1772--1788.

[36]

Burr Settles. 2009. Active learning literature survey. (2009).

[37]

Sarath Sreedharan et al. 2020. Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Black Box Simulators. arXiv preprint arXiv:2002.01080 (2020).

[38]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[39]

Aaquib Tabrez, Shivendra Agrawal, and Bradley Hayes. 2019. Explanation-based reward coaching to improve human performance via reinforcement learning. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 249--257.

Cited By

Klauschen FDippel JKeyl PJurmeister PBockmayr MMock ABuchstab OAlber MRuff LMontavon GMüller K(2024)Toward Explainable Artificial Intelligence for Precision PathologyAnnual Review of Pathology: Mechanisms of Disease10.1146/annurev-pathmechdis-051222-11314719:1(541-570)Online publication date: 24-Jan-2024
https://doi.org/10.1146/annurev-pathmechdis-051222-113147
Lyu HBai YLiang XDas UShi CGong LLi YSun MGe MMa X(2024)FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System to Assist Human Labelers’ Preference ElicitationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645145(344-369)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645145

Index Terms

Transparent Value Alignment
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Cognitive robotics
    2. Philosophical/theoretical foundations of artificial intelligence
      1. Theory of mind
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI theory, concepts and models
    2. Interaction paradigms
      1. Collaborative interaction

Recommendations

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objectives of its users. While value alignment remains a popular topic within AI safety research, most existing works in this sphere ...
AI Alignment and Human Reward
AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

According to a prominent approach to AI alignment, AI agents should be built to learn and promote human values. However, humans value things in several different ways: we have desires and preferences of various kinds, and if we engage in reinforcement ...
Value-Alignment Equilibrium in Multiagent Systems
Trustworthy AI - Integrating Learning, Optimization and Reasoning
Abstract
Value alignment has emerged in recent years as a basic principle to produce beneficial and mindful Artificial Intelligence systems. It mainly states that autonomous entities should behave in a way that is aligned with our human values. In this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HRI '23: Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction

March 2023

612 pages

ISBN:9781450399708

DOI:10.1145/3568294

General Chairs:
Ginevra Castellano
Uppsala University, Sweden
,
Laurel Riek
University of California San Diego, USA
,
Program Chairs:
Maya Cakmak
University of Washington, USA
,
Iolanda Leite
KTH Royal Institute of Technology, Sweden

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2023

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Army Research Laboratory

Conference

HRI '23

Sponsor:

HRI '23: ACM/IEEE International Conference on Human-Robot Interaction

March 13 - 16, 2023

Stockholm, Sweden

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
515
Total Downloads

Downloads (Last 12 months)287
Downloads (Last 6 weeks)24

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Klauschen FDippel JKeyl PJurmeister PBockmayr MMock ABuchstab OAlber MRuff LMontavon GMüller K(2024)Toward Explainable Artificial Intelligence for Precision PathologyAnnual Review of Pathology: Mechanisms of Disease10.1146/annurev-pathmechdis-051222-11314719:1(541-570)Online publication date: 24-Jan-2024
https://doi.org/10.1146/annurev-pathmechdis-051222-113147
Lyu HBai YLiang XDas UShi CGong LLi YSun MGe MMa X(2024)FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System to Assist Human Labelers’ Preference ElicitationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645145(344-369)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645145

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents