skip to main content
research-article

Context-Aware Automated Analysis and Annotation of Social Human--Agent Interactions

Published: 30 June 2015 Publication History

Abstract

The outcome of interpersonal interactions depends not only on the contents that we communicate verbally, but also on nonverbal social signals. Because a lack of social skills is a common problem for a significant number of people, serious games and other training environments have recently become the focus of research. In this work, we present NovA (Nonverbal behavior Analyzer), a system that analyzes and facilitates the interpretation of social signals automatically in a bidirectional interaction with a conversational agent. It records data of interactions, detects relevant social cues, and creates descriptive statistics for the recorded data with respect to the agent's behavior and the context of the situation. This enhances the possibilities for researchers to automatically label corpora of human--agent interactions and to give users feedback on strengths and weaknesses of their social behavior.

Supplementary Material

a11-baur-apndx.pdf (baur.zip)
Supplemental movie, appendix, image and software files for, Context-Aware Automated Analysis and Annotation of Social Human--Agent Interactions

References

[1]
Keith Anderson, Elisabeth André, Tobias Baur, Sarah Bernardini, Mathieu Chollet, Evy Chryssafidou, Ionut Damian, et al. 2013. The TARDIS framework: Intelligent virtual agents for social coaching in job interviews. In Proceedings of the 10th International Conference on Advances in Computer Entertainment Technology (ACE-13). Lecture Notes in Computer Science 8253.
[2]
Ligia Maria Batrinca, Giota Stratou, Ari Shapiro, Louis-Philippe Morency, and Stefan Scherer. 2013. Cicero - Towards a multimodal virtual audience platform for public speaking training. In Proceedings of the 13th International Conference on Intelligent Virtual Agents (IVA'13), Ruth Aylett, Brigitte Krenn, Catherine Pelachaud, and Hiroshi Shimodaira (Eds.), Lecture Notes in Computer Science, Vol. 8108. Springer, 116--128.
[3]
Tobias Baur, Ionut Damian, Patrick Gebhard, Kaska Porayska-Pomsta, and Elisabeth André. 2013a. A job interview simulation: Social cue-based interaction with a virtual character. In Proceedings of the 2013 IEEE/ASE International Conference on Social Computing (SocialCom'13). 220--227.
[4]
Tobias Baur, Ionut Damian, Florian Lingenfelser, Johannes Wagner, and Elisabeth André. 2013b. NovA: Automated analysis of nonverbal signals in social interactions. In Human Behavior Understanding, Albert Ali Salah, Hayley Hung, Oya Aran, and Hatice Gunes (Eds.). Lecture Notes in Computer Science, Vol. 8212. Springer, 160--171.
[5]
Paul Boersma and David Weenink. 2005. Praat: Doing phonetics by computer (version 4.3.14) {computer program}.
[6]
Antonio Camurri, Gualtiero Volpe, Giovanni De Poli, and Marc Leman. 2005. Communicating expressiveness and affect in multimodal interactive systems. IEEE Multimedia 12, 1, 43--53.
[7]
George Caridakis, Amaryllis Raouzaiou, Kostas Karpouzis, and Stefanos Kollias. 2006. Synthesizing gesture expressivity based on real sequences. In Proceedings of teh Workshop on Multimodal Corpora: From Multimodal Behaviour Theories to Usable Models.
[8]
Cristina Conati and Heather Maclaren. 2009. Empirically building and evaluating a probabilistic model of user affect. User Modeling and User-Adapted Interaction 19, 3, 267--303.
[9]
Roddy Cowie, Ellen Douglas-Cowie, Susie Savvidou*, Edelle McMahon, Martin Sawey, and Marc Schröder. 2000. ‘FEELTRACE’: An instrument for recording perceived emotion in real time. In Proceedings of the ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion.
[10]
Roddy Cowie, Gary McKeown, and Ellen Douglas-Cowie. 2012. Tracing emotion: An overview. International Journal of Synthetic Emotions (IJSE) 3, 1 (Jan. 2012), 1--17.
[11]
Jared R. Curhan and Alex Pentland. 2007. Thin Slices of negotiation: Predicting outcomes from conversational dynamics within the first 5 minutes. Journal of Applied Psychology 92, 3, 802--811.
[12]
Ionut Damian, Tobias Baur, and Elisabeth André. 2013. Investigating social cue-based interaction in digital learning games. In Proceedings of the 8th International Conference on the Foundations of Digital Games.
[13]
Nivja H. de Jong and Ton Wempe. 2009. Praat script to detect syllable nuclei and measure speech rate automatically. Behavior Research Methods 41, 2, 385--390.
[14]
Wen Dong, Bruno Lepri, Alessandro Cappelletti, Alex Sandy Pentland, Fabio Pianesi, and Massimo Zancanaro. 2007. Using the influence model to recognize functional roles in meetings. In Proceedings of the 9th International Conference on Multimodal Interfaces (ICMI'07). ACM, New York, NY, 271--278.
[15]
Birgit Endraß, Elisabeth André, Matthias Rehm, and Yukiko I. Nakano. 2013. Investigating culture-related aspects of behavior for virtual characters. Autonomous Agents and Multi-Agent Systems 27, 2, 277--304.
[16]
Florian Eyben, Felix Weninger, Florian Gross, and Björn Schuller. 2013. Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of the 21st ACM International Conference on Multimedia (MM'13). ACM, New York, NY, 835--838.
[17]
Patrick Gebhard, Tobias Baur, Ionut Damian, Gregor Mehlmann, Johannes Wagner, and Elisabeth André. 2014. Exploring interaction strategies for virtual characters to induce stress in simulated job interviews. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 661--668.
[18]
Patrick Gebhard, Gregor Mehlmann, and Michael Kipp. 2012. Visual SceneMaker - A tool for authoring interactive virtual characters. Journal on Multimodal User Interfaces: Interacting with Embodied Conversational Agents 6, 1--2, 3--11.
[19]
Arno Hartholt, David Traum, Stacy C. Marsella, Ari Shapiro, Giota Stratou, Anton Leuski, Louis-Philippe Morency, and Jonathan Gratch. 2013. All together now. Intelligent Virtual Agents, Ruth Aylett, Brigitte Krenn, Catherine Pelachaud, and Hiroshi Shimodaira (Eds.). Lecture Notes in Computer Science, Vol. 8108. Springer Berlin Heidelberg, 368--381.
[20]
Hatice Gunes, Massimo Piccardi, and Maja Pantic. 2008. From the lab to the real world: Affect recognition using multiple cues and modalities.
[21]
Torild Hammer. 2000. Mental health and social exclusion among unemployed youth in Scandinavia. A comparative study. International Journal of Social Welfare 9, 1, 53--63.
[22]
Jinni A. Harrigan and Kristy T. Taing. 1997. Fooled by a smile: Detecting anxiety in others. Journal of Nonverbal Behavior 21, 3, 203--221.
[23]
Mohammed (Ehsan) Hoque, Matthieu Courgeon, Jean-Claude Martin, Bilge Mutlu, and Rosalind W. Picard. 2013. MACH: My automated conversation coach. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp'13), Friedemann Mattern, Silvia Santini, John F. Canny, Marc Langheinrich, and Jun Rekimoto (Eds.). ACM, 697--706.
[24]
Hayley Hung and Daniel Gatica-Perez. 2010. Estimating cohesion in small groups using audio-visual nonverbal behavior. IEEE Transactions on Multimedia 12, 6 (Oct. 2010), 563--575.
[25]
Sin-Hwa Kang, Jonathan Gratch, Candy L. Sidner, Ron Artstein, Lixing Huang, and Louis-Philippe Morency. 2012. Towards building a virtual counselor: Modeling nonverbal behavior during intimate self-disclosure. In Proceedings of the 2012 International Conference on Autonomous Agents and Multiagent Systems, Wiebe van der Hoek, Lin Padgham, Vincent Conitzer, and Michael Winikoff (Eds.). IFAAMAS, 63--70.
[26]
Jonghwa Kim and Elisabeth André. 2008. Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 12, 2067--2083.
[27]
Michael Kipp. 2013. ANVIL: The video annotation research tool. In Handbook of Corpus Phonology. Oxford University Press, Oxford, UK.
[28]
Felix Kistler, Birgit Endrass, Ionut Damian, Chi-Tai Dang, and Elisabeth André. 2012. Natural interaction with culturally adaptive virtual characters. Germany Journal on Multimodal User Interfaces Heidelberg/Berlin.
[29]
Andrea Kleinsmith and Nadia Bianchi-Berthouze. 2011. Form as a cue in the automatic recognition of non-acted affective body expressions. In Affective Computing and Intelligent Interaction, Sidney D'Mello, Arthur Graesser, Björn Schuller, and Jean-Claude Martin (Eds.). Lecture Notes in Computer Science, Vol. 6974. Springer, Berlin, 155--164.
[30]
Robert E. Kraut and Robert E. Johnston. 1979. Social and emotional messages of smiling: An ethological approach. Journal of Personality and Social Psychology 37, 9, 1539--1553.
[31]
Marwa Mahmoud, Louis-Philippe Morency, and Peter Robinson. 2013. Automatic multimodal descriptors of rhythmic body movement. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction. ACM, 429--436.
[32]
Maurizio Mancini, Laurent Ach, Emeline Bantegnie, Tobias Baur, Nadia Berthouze, Debajyoti Datta, Yu Ding, Stéphane Dupont, Harry J. Griffin, Florian Lingenfelser, et al. 2014. Laugh when you're winning. In Innovative and Creative Developments in Multimodal Interaction Systems. Springer, 50--79.
[33]
Gary McKeown and Ian Sneddon. 2014. Modeling continuous self-report measures of perceived emotion using generalized additive mixed models. Psychological Methods 19, 1, 155.
[34]
Gary McKeown, Ian Sneddon, and William Curran. 2015. Gender differences in the perceptions of genuine and simulated laughter and amused facial expressions. Emotion Review 7, 1, 30--38.
[35]
Gregor Mehlmann and Elisabeth André. 2012. Modeling multimodal integration with event logic charts. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI'12). ACM, New York, NY, 125--132.
[36]
Gregor Mehlmann, Birgit Endraß, and Elisabeth André. 2011. Modeling parallel state charts for multithreaded multimodal dialogues. In Proceedings of the 13th International Conference on Multimodal Interaction (ICMI'11). ACM, New York, NY, 385--392.
[37]
Gregor Mehlmann, Kathrin Janowski, Tobias Baur, Markus Häring, Elisabeth André, and Patrick Gebhard. 2014. Modeling gaze mechanisms for grounding in HRI. In Proceedings of the 21th European Conference on Artificial Intelligence (ECAI'14). IOS Press Ebooks, Amsterdam, Netherlands, 1069--1070.
[38]
Stephane Michelet, Koby Karp, Emilie Delaherche, Catherine Achard, and Mohamed Chetouani. 2012. Automatic imitation assessment in interaction. In Human Behavior Understanding, AlbertAli Salah, Javier Ruiz-del Solar, etin Merili, and Pierre-Yves Oudeyer (Eds.). Lecture Notes in Computer Science, Vol. 7559. Springer, Berlin, 161--173.
[39]
Louis-Philippe Morency, Candace Sidner, Christopher Lee, and Trevor Darrell. 2007. Head gestures for perceptual interfaces: The role of context in improving recognition. Artificial Intelligence 171, 8, 568--585.
[40]
Kevin Patrick Murphy. 2002. Dynamic bayesian networks: Representation, inference and learning. Ph.D. Dissertation. University of California, Berkeley.
[41]
Yukiko I. Nakano and Ryo Ishii. 2010. Estimating user's engagement from eye-gaze behaviors in human-agent conversations. In Proceedings of the 15th International Conference on Intelligent User Interfaces. ACM, New York, NY, 139--148.
[42]
Radoslaw Niewiadomski, Jennifer Hofmann, Jérôme Urbain, Tracey Platt, Johannes Wagner, Bilal Piot, Huseyin Cakmak, Sathish Pammi, Tobias Baur, Stephane Dupont, Matthieu Geist, Florian Lingenfelser, Gary McKeown, Olivier Pietquin, and Willibald Ruch. 2013. Laugh-aware virtual agent and its impact on user amusement. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 619--626.
[43]
Xueni Pan, Marco Gillies, Chris Barker, David M. Clark, and Mel Slater. 2012. Socially anxious and confident men interact with a forward virtual woman: An experimental study. PloS One 7, 4, e32931.
[44]
Maja Pantic, Nicu Sebe, Jeffrey F. Cohn, and Thomas Huang. 2005. Affective multimodal human-computer interaction. In Proceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA'05). ACM, New York, NY, 669--676.
[45]
Allan Pease. 1988. Body Language. Sheldon Press, London.
[46]
Alex Pentland. 2007. Automatic mapping and modelling of human networks. Physica A, 378, 59--67.
[47]
Kaka Porayska-Pomsta, Paola Rizzo, Ionut Damian, Tobias Baur, Elisabeth André, Nicolas Sabouret, Hazael Jones, Keith Anderson, and Evi Chryssafidou. 2014. Who's afraid of job interviews? Definitely a question for user modelling. In User Modeling, Adaptation, and Personalization, Vania Dimitrova, Tsvi Kuflik, David Chin, Francesco Ricci, Peter Dolog, and Geert-Jan Houben (Eds.). Lecture Notes in Computer Science, Vol. 8538. Springer, 411--422.
[48]
Charles Rich, Brett Ponsleur, Aaron Holroyd, and Candace L. Sidner. 2010. Recognizing engagement in human-robot interaction. In Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI'10). IEEE Press, Piscataway, NJ, 375--382.
[49]
Tobias Ruf, Andreas Ernst, and Christian Küblbeck. 2011. Face detection with the sophisticated high-speed object recognition engine (SHORE). In Microelectronic Systems. Springer, 243--252.
[50]
Georgia Sandbach, Stefanos Zafeiriou, Maja Pantic, and Lijun Yin. 2012. Static and dynamic 3D facial expression recognition: A comprehensive survey. Image and Vision Computing 30, 10 (Oct. 2012), 683--697.
[51]
Stefan Scherer, Stacy Marsella, Giota Stratou, Yuyu Xu, Fabrizio Morbini, Alesia Egan, Albert (Skip) Rizzo, and Louis-Philippe Morency. 2012. Perception markup language: Towards a standardized representation of perceived nonverbal behaviors. In Intelligent Virtual Agents, Yukiko Nakano, Michael Neff, Ana Paiva, and Marilyn Walker (Eds.). Lecture Notes in Computer Science, Vol. 7502. Springer, Berlin, 455--463.
[52]
Thomas Schmidt. 2004. Transcribing and annotating spoken language with EXMARaLDA. In Proceedings of the International Conference on Language Resources and Evaluation: Workshop on XML Based Richly Annotated Corpora. ELRA, Paris, 879--896.
[53]
Marc Schröder, Elisabetta Bevacqua, Roddy Cowie, Florian Eyben, Hatice Gunes, Dirk Heylen, Mark Ter Maat, Gary McKeown, Sathish Pammi, Maja Pantic, Catherine Pelachaud, Björn Schuller, Etienne de Sevin, Michel Valstar, and Martin Wöllmer. 2012. Building autonomous sensitive artificial listeners. IEEE Transactions on Affective Computing 3, 2 (April 2012), 165--183.
[54]
Nicu Sebe, Ira Cohen, Theo Gevers, and Thomas S. Huang. 2006. Emotion recognition based on joint visual and audio cues. In Proceedings of the 18th International Conference on Pattern Recognition - Volume 01 (ICPR'06). IEEE Computer Society, Washington, DC, 1136--1139.
[55]
Candace L. Sidner, Cory D. Kidd, Christopher Lee, and Neal Lesh. 2004. Where to look: A study of human-robot engagement. In Proceedings of the 9th International Conference on Intelligent User Interfaces. ACM, New York, NY, 78--84.
[56]
Anne Loomis Thompson and Dan Bohus. 2013. A framework for multimodal data collection, visualization, annotation and learning. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction (ICMI'13). ACM, New York, NY, 67--68.
[57]
David R. Traum, David DeVault, Jina Lee, Zhiyang Wang, and Stacy Marsella. 2012. Incremental dialogue understanding and feedback for multiparty, multimodal conversation. In Proceedings of the 12th International Conference on Intelligent Virtual Agents (IVA'12), Yukiko Nakano, Michael Neff, Ana Paiva, and Marilyn A. Walker (Eds.). Lecture Notes in Computer Science, Vol. 7502. Springer, 275--288.
[58]
Thurid Vogt, Elisabeth André, and Nikolaus Bee. 2008. EmoVoice - A framework for online recognition of emotions from voice. In Perception in Multimodal Dialogue Systems, 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems. Springer, 188--199.
[59]
Johannes Wagner, Florian Lingenfelser, Tobias Baur, Ionut Damian, Felix Kistler, and Elisabeth André. 2013. The social signal interpretation (SSI) framework - Multimodal signal processing and recognition in real-time. In Proceedings of ACM MULTIMEDIA 2013.
[60]
Harald G. Wallbott. 1998. Bodily expression of emotion. European Journal of Social Psychology 28, 879--896.
[61]
Peter Wittenburg, Hennie Brugman, Albert Russel, Alex Klassmann, and Han Sloetjes. 2006. ELAN: A professional framework for multimodality research. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'06). 879--896.
[62]
Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. IEEE, 532--539.

Cited By

View all
  • (2024)Multilingual Dyadic Interaction Corpus NoXi+J: Toward Understanding Asian-European Non-verbal Cultural Characteristics and their Influences on EngagementProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685757(224-233)Online publication date: 4-Nov-2024
  • (2024)The interaction design of 3D virtual humans: A surveyComputer Science Review10.1016/j.cosrev.2024.10065353(100653)Online publication date: Aug-2024
  • (2023)The Deep Method: Towards Computational Modeling of the Social Emotion Shame Driven by Theory, Introspection, and Social SignalsIEEE Transactions on Affective Computing10.1109/TAFFC.2023.329806215:2(417-432)Online publication date: 24-Jul-2023
  • Show More Cited By

Index Terms

  1. Context-Aware Automated Analysis and Annotation of Social Human--Agent Interactions

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Interactive Intelligent Systems
    ACM Transactions on Interactive Intelligent Systems  Volume 5, Issue 2
    Special Issue on Behavior Understanding for Arts and Entertainment (Part 1 of 2)
    July 2015
    144 pages
    ISSN:2160-6455
    EISSN:2160-6463
    DOI:10.1145/2799389
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 June 2015
    Accepted: 01 April 2015
    Revised: 01 March 2015
    Received: 01 March 2014
    Published in TIIS Volume 5, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Social cue recognition
    2. automated behavior analysis
    3. interaction design
    4. serious games
    5. virtual job interviews

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • European Unions Horizon 2020 research and innovation programme
    • European Commission within FP7-ICT-2011-7

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)40
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multilingual Dyadic Interaction Corpus NoXi+J: Toward Understanding Asian-European Non-verbal Cultural Characteristics and their Influences on EngagementProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685757(224-233)Online publication date: 4-Nov-2024
    • (2024)The interaction design of 3D virtual humans: A surveyComputer Science Review10.1016/j.cosrev.2024.10065353(100653)Online publication date: Aug-2024
    • (2023)The Deep Method: Towards Computational Modeling of the Social Emotion Shame Driven by Theory, Introspection, and Social SignalsIEEE Transactions on Affective Computing10.1109/TAFFC.2023.329806215:2(417-432)Online publication date: 24-Jul-2023
    • (2022)Automatic engagement estimation in smart education/learning settings: a systematic review of engagement definitions, datasets, and methodsSmart Learning Environments10.1186/s40561-022-00212-y9:1Online publication date: 12-Nov-2022
    • (2022)Generating Personalized Behavioral Feedback for a Virtual Job Interview Training System Through Adversarial LearningArtificial Intelligence in Education10.1007/978-3-031-11644-5_67(679-684)Online publication date: 27-Jul-2022
    • (2021)Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human GazeSensors10.3390/s2112414321:12(4143)Online publication date: 16-Jun-2021
    • (2021)An Open Dataset for Impression Recognition from Multimodal Bodily Responses2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII)10.1109/ACII52823.2021.9597421(1-8)Online publication date: 28-Sep-2021
    • (2020)Are You Still With Me? Continuous Engagement Assessment From a Robot's Point of ViewFrontiers in Robotics and AI10.3389/frobt.2020.001167Online publication date: 16-Sep-2020
    • (2019)Serious Games for Training Social Skills in Job InterviewsIEEE Transactions on Games10.1109/TG.2018.280852511:4(340-351)Online publication date: Dec-2019
    • (2019)A Taxonomy of Social Cues for Conversational AgentsInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2019.07.009132:C(138-161)Online publication date: 1-Dec-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media