skip to main content
10.1145/3610978.3638371acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
abstract

Multi-modal Language Models for Human-Robot Interaction

Published: 11 March 2024 Publication History

Abstract

The recent progress in language models is enabling more flexible and natural conversation abilities for social robots. However, these language models were never made to be used in a physically embodied social agent. They lack the ability to process the other modalities humans use in conversation, such as vision, to make references to the environment and understand non-verbal communication. My work promotes the design of language models for physically embodied social interactions, shows how current technologies can be leveraged to enrich language models with these abilities, and explores how such multi-modal language models can be used to improve interactions.

References

[1]
Tony Belpaeme, Paul Baxter, Robin Read, Rachel Wood, Heriberto Cuayáhuitl, Bernd Kiefer, Stefania Racioppa, Ivana Kruijff-Korbayová, Georgios Athanasopoulos, Valentin Enescu, et al. 2012. Multimodal child-robot interaction: Building social bonds. Journal of Human-Robot Interaction, Vol. 1, 2 (2012).
[2]
LaVonda Brown and Ayanna M Howard. 2013. Engaging children in math education using a socially interactive humanoid robot. In 2013 13th IEEE-RAS Int. Conf. on Humanoid Robots (Humanoids). IEEE, 183--188.
[3]
LaVonda Brown, Ryan Kerwin, and Ayanna M Howard. 2013. Applying behavioral strategies for student engagement using a robotic educational agent. In 2013 IEEE Int. Conf. on systems, man, and cybernetics. IEEE, 4360--4365.
[4]
Nikhil Churamani, Minja Axelsson, Atahan Caldir, and Hatice Gunes. 2022. Continual learning for affective robotics: A proof of concept for wellbeing. arXiv preprint arXiv:2206.11354 (2022).
[5]
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra. 2017. Visual dialog. In Proceedings of the IEEE conference on computer vision and pattern recognition. 326--335.
[6]
Alessandro Di Nuovo, Frank Broz, Ning Wang, Tony Belpaeme, Angelo Cangelosi, Ray Jones, Raffaele Esposito, Filippo Cavallo, and Paolo Dario. 2018. The multi-modal interface of Robot-Era multi-robot services tailored for the elderly. Intelligent Service Robotics, Vol. 11, 1 (2018), 109--126.
[7]
Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao, et al. 2022. Vision-language pre-training: Basics, recent advances, and future trends. Foundations and Trends® in Computer Graphics and Vision, Vol. 14, 3--4 (2022), 163--352.
[8]
Charles Goodwin. 2000. Action and embodiment within situated human interaction. Journal of pragmatics, Vol. 32, 10 (2000), 1489--1522.
[9]
Stevan Harnad. 1990. The symbol grounding problem. Physica D: Nonlinear Phenomena, Vol. 42, 1--3 (1990), 335--346.
[10]
Julian Hough and David Schlangen. 2016. Investigating fluidity for human-robot interaction with real-time, real-world grounding strategies. In Proceedings of the 17th Annual SIGdial Meeting on Discourse and Dialogue.
[11]
Bahar Irfan, Anika Narayanan, and James Kennedy. 2020. Dynamic Emotional Language Adaptation in Multiparty Interactions with Agents. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1--8.
[12]
Ruben Janssens, Pieter Wolfert, Thomas Demeester, and Tony Belpaeme. 2022. "Cool glasses, where did you get them?" Generating Visually Grounded Conversation Starters for Human-Robot Dialogue. In 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 821--825.
[13]
Wafa Johal, Gaëlle Calvary, and Sylvie Pesty. 2015. Non-verbal signals in HRI: interference in human perception. In International Conference on Social Robotics. Springer, 275--284.
[14]
Iolanda Leite, Marissa McCoy, Monika Lohani, Nicole Salomons, Kara McElvaine, Charlene Stokes, Susan Rivers, and Brian Scassellati. 2016. Autonomous disengagement classification and repair in multiparty child-robot interaction. In 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE, 525--532.
[15]
Séverin Lemaignan, Mathieu Warnier, E. Akin Sisbot, Aurélie Clodic, and Rachid Alami. 2017. Artificial cognition for social human--robot interaction: An implementation. Artificial Intelligence, Vol. 247 (2017), 45 -- 69. https://doi.org/10.1016/j.artint.2016.07.002 Special Issue on AI and Robotics.
[16]
Nikolaos Mavridis. 2015. A review of verbal and non-verbal human--robot interactive communication. Robotics and Autonomous Systems, Vol. 63 (2015), 22 -- 35. https://doi.org/10.1016/j.robot.2014.09.031
[17]
Peter Mayer and Paul Panek. 2014. Towards a multi-modal user interface for an affordable Assistive Robot. In International Conference on Universal Access in Human-Computer Interaction. Springer, 680--691.
[18]
Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, and Ping Luo. 2023. Embodiedgpt: Vision-language pre-training via embodied chain of thought. arXiv preprint arXiv:2305.15021 (2023).
[19]
Oliver Roesler, Amir Aly, Tadahiro Taniguchi, and Yoshikatsu Hayashi. 2019. Evaluation of word representations in grounding natural language instructions through computational human-robot interaction. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 307--316.
[20]
Maha Salem, Katharina Rohlfing, Stefan Kopp, and Frank Joublin. 2011. A friendly gesture: Investigating the effect of multimodal robot behavior in human-robot interaction. In 2011 Ro-Man. IEEE, 247--252.
[21]
Anirudh Sundar and Larry Heck. 2022. Multimodal Conversational AI: A Survey of Datasets and Approaches. In Proceedings of the 4th Workshop on NLP for Conversational AI. 131--147.
[22]
Alessandro Vinciarelli, Maja Pantic, and Hervé Bourlard. 2009. Social signal processing: Survey of an emerging domain. Image and Vision Computing, Vol. 27, 12 (2009), 1743--1759. https://doi.org/10.1016/j.imavis.2008.11.007 Visual and multimodal analysis of human spontaneous behaviour:.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction
March 2024
1408 pages
ISBN:9798400703232
DOI:10.1145/3610978
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 March 2024

Check for updates

Author Tags

  1. conversational agent
  2. grounding
  3. human-robot interaction
  4. multi-modal dialogue
  5. natural language generation
  6. natural language processing
  7. situatedness

Qualifiers

  • Abstract

Funding Sources

  • Flemish Government

Conference

HRI '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 75
    Total Downloads
  • Downloads (Last 12 months)75
  • Downloads (Last 6 weeks)10
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media