skip to main content
research-article

Man and the Machine: Effects of AI-assisted Human Labeling on Interactive Annotation of Real-time Video Streams

Published: 23 April 2024 Publication History
  • Get Citation Alerts
  • Abstract

    AI-assisted interactive annotation is a powerful way to facilitate data annotation—a prerequisite for constructing robust AI models. While AI-assisted interactive annotation has been extensively studied in static settings, less is known about its usage in dynamic scenarios where the annotators operate under time and cognitive constraints, e.g., while detecting suspicious or dangerous activities from real-time surveillance feeds. Understanding how AI can assist annotators in these tasks and facilitate consistent annotation is paramount to ensure high performance for AI models trained on these data. We address this gap in interactive machine learning (IML) research, contributing an extensive investigation of the benefits, limitations, and challenges of AI-assisted annotation in dynamic application use cases. We address both the effects of AI on annotators and the effects of (AI) annotations on the performance of AI models trained on annotated data in real-time video annotations. We conduct extensive experiments that compare annotation performance at two annotator levels (expert and non-expert) and two interactive labeling techniques (with and without AI assistance). In a controlled study with \(N=34\) annotators and a follow-up study with 51,963 images and their annotation labels being input to the AI model, we demonstrate that the benefits of AI-assisted models are greatest for non-expert users and for cases where targets are only partially or briefly visible. The expert users tend to outperform or achieve similar performance as the AI model. Labels combining AI and expert annotations result in the best overall performance as the AI reduces overflow and latency in the expert annotations. We derive guidelines for the use of AI-assisted human annotation in real-time dynamic use cases.

    References

    [1]
    Voncarlos M. Araújo, Ankita Shukla, Clément Chion, Sébastien Gambs, and Robert Michaud. 2022. Machine-learning approach for automatic detection of wild beluga whales from hand-held camera pictures. Sensors 22, 11 (2022), 4107. DOI:
    [2]
    Zahra Ashktorab, Michael Desmond, Josh Andres, Michael Muller, Narendra Nath Joshi, Michelle Brachman, Aabhas Sharma, Kristina Brimijoin, Qian Pan, Christine T. Wolf, et al. 2021. AI-assisted human labeling: Batching for efficiency without overreliance. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–27.
    [3]
    Jürgen Bernard, Marco Hutter, Matthias Zeppelzauer, Dieter Fellner, and Michael Sedlmair. 2017. Comparing visual-interactive labeling with active learning: An experimental study. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 298–308.
    [4]
    Jürgen Bernard, Matthias Zeppelzauer, Michael Sedlmair, and Wolfgang Aigner. 2018. VIAL: A unified process for visual interactive labeling. The Visual Computer 34, 9 (2018), 1189–1207.
    [5]
    Riccardo Bertolo, Andrew Hung, Francesco Porpiglia, Pierluigi Bove, Mary Schleicher, and Prokar Dasgupta. 2020. Systematic review of augmented reality in urological interventions: The evidences of an impact on surgical outcomes are yet to come. World Journal of Urology 38 (2020), 2167–2176.
    [6]
    Trevor Beugeling and Alexandra Branzan-Albu. 2014. Computer vision-based identification of individual turtles using characteristic patterns of their plastrons. In 2014 Canadian Conference on Computer and Robot Vision. IEEE, 203–210.
    [7]
    Carla E. Brodley and Mark A. Friedl. 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research 11 (1999), 131–167.
    [8]
    John Calambokidis, Jay Barlow, Kiirsten Flynn, Elana Dobson, and Gretchen H. Steiger. 2017. Update on Abundance, Trends, and Migrations of Humpback Whales along the US West Coast. Technical Report SC/A17/NP/13. International Whaling Commision.
    [9]
    Steven J. B. Carter, Ian P. Bell, Jessica J. Miller, and Peter P. Gash. 2014. Automated marine turtle photograph identification using artificial neural networks, with application to green turtles. Journal of Experimental Marine Biology and Ecology 452 (2014), 105–110.
    [10]
    Chih-Ming Chen, Ming-Chaun Li, and Tze-Chun Chen. 2020. A web-based collaborative reading annotation system with gamification mechanisms to improve reading performance. Computers & Education 144 (2020), 103697.
    [11]
    Minsuk Choi, Cheonbok Park, Soyoung Yang, Yonggyu Kim, Jaegul Choo, and Sungsoo Ray Hong. 2019. AILA: Attentive interactive labeling assistant for document classification through attention-based deep neural networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–12.
    [12]
    Michael Desmond, Michael Muller, Zahra Ashktorab, Casey Dugan, Evelyn Duesterwald, Kristina Brimijoin, Catherine Finegan-Dollak, Michelle Brachman, Aabhas Sharma, Narendra Nath Joshi, et al. 2021. Increasing the speed and accuracy of data labeling through an AI assisted interface. In 26th International Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, USA, 392–401.
    [13]
    Samuel Dodge and Lina Karam. 2017. A study and comparison of human and deep learning recognition performance under visual distortions. In 2017 26th International Conference on Computer Communication and Networks (ICCCN ’17). IEEE, 1–7.
    [14]
    Cristian Felix, Aritra Dasgupta, and Enrico Bertini. 2018. The exploratory labeling assistant: Mixed-initiative label curation with large document collections. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 153–164.
    [15]
    Hongbo Gao, Bo Cheng, Jianqiang Wang, Keqiang Li, Jianhui Zhao, and Deyi Li. 2018. Object classification using CNN-based fusion of vision and LIDAR in autonomous vehicle environment. IEEE Transactions on Industrial Informatics 14, 9 (2018), 4224–4231.
    [16]
    Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan Ö. Arık, Larry S. Davis, and Tomas Pfister. 2020. Consistency-based semi-supervised active learning: Towards minimizing labeling cost. In European Conference on Computer Vision. Springer,510–526.
    [17]
    Robert Geirhos, David H. J. Janssen, Heiko H. Schütt, Jonas Rauber, Matthias Bethge, and Felix A. Wichmann. 2017. Comparing deep neural networks against humans: Object recognition when the signal gets weaker. CoRR abs/1706.06969 (2017). arXiv:1706.06969http://arxiv.org/abs/1706.06969
    [18]
    Philippe Golle. 2008. Machine learning attacks against the Asirra CAPTCHA. In Proceedings of the 15th ACM Conference on Computer and Communications Security (CCS ’08). Association for Computing Machinery, New York, NY, USA, 535–542. DOI:
    [19]
    Michael Damien Haberlin. 2010. Insights into Jellyfish Distribution and Abundance Provided by a Platform of Opportunity. Ph. D. Dissertation. NUI.
    [20]
    Donald Joseph Hejna III and Dorsa Sadigh. 2023. Few-shot preference learning for human-in-the-loop RL. In Conference on Robot Learning. PMLR, 2014–2025.
    [21]
    Andreas Holzinger. 2016. Interactive machine learning for health informatics: When do we need the human-in-the-loop? Brain Informatics 3, 2 (2016), 119–131.
    [22]
    Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 159–166.
    [23]
    Guanxi Huang. 2021. A comparative study of underwater marine products detection based on YOLOv5 and underwater image enhancement. International Core Journal of Engineering 7, 5 (2021), 213–221.
    [24]
    Robert L. Hulsman and Jane van der Vloodt. 2015. Self-evaluation and peer-feedback of medical students’ communication skills using a web-based video annotation system. Exploring content and specificity. Patient Education and Counseling 98, 3 (2015), 356–363.
    [25]
    Wu-Yuin Hwang, Chin-Yu Wang, and Mike Sharples. 2007. A study of multimedia annotation of Web-based materials. Computers & Education 48, 4 (2007), 680–699.
    [26]
    Md Milon Islam and Muhammad Sheikh Sadi. 2018. Path hole detection to assist the visually impaired people in navigation. In 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT ’18). IEEE, 268–273.
    [27]
    Robert A. Jacobs and Christopher J. Bates. 2019. Comparing the visual representations and performance of humans and deep neural networks. Current Directions in Psychological Science 28, 1 (2019), 34–39.
    [28]
    Hojin Jang, Devin McCormack, and Frank Tong. 2021. Noise-robust recognition of objects by humans and deep neural networks. bioRxiv (2021). DOI:https://www.biorxiv.org/content/early/2021/06/09/2020.08.03.234625.full.pdf
    [29]
    Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1, 9 (2019), 389–399.
    [30]
    Daniel Kang, Yi Sun, Dan Hendrycks, Tom Brown, and Jacob Steinhardt. 2019. Testing robustness against unforeseen adversaries. CoRR abs/1908.08016 (2019). arXiv:1908.08016http://arxiv.org/abs/1908.08016
    [31]
    Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 3075–3084.
    [32]
    Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Tom Duerig, and Vittorio Ferrari. 2018. The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale. arxiv:1811.00982 [cs.CV]
    [33]
    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.
    [34]
    Joe Lemley, Shabab Bazrafkan, and Peter Corcoran. 2017. Deep learning for consumer devices and services: Pushing the limits for machine learning, artificial intelligence, and computer vision. IEEE Consumer Electronics Magazine 6, 2 (2017), 48–56.
    [35]
    Shih-Chieh Lin, Chang-Hong Hsu, Walter Talamonti, Yunqi Zhang, Steve Oney, Jason Mars, and Lingjia Tang. 2018. Adasa: A conversational in-vehicle digital assistant for advanced driver assistance features. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 531–542.
    [36]
    Chenguang Liu, Xiumin Chu, Wenxiang Wu, Songlong Li, Zhibo He, Mao Zheng, Haiming Zhou, and Zhixiong Li. 2022. Human–machine cooperation research for navigation of maritime autonomous surface ships: A review and consideration. Ocean Engineering 246 (2022), 110555. DOI:
    [37]
    Rosalia Maglietta, Vito Renò, Giulia Cipriano, Carmelo Fanizza, Annalisa Milella, Ettore Stella, and Roberto Carlucci. 2018. DolFin: An innovative digital platform for studying Risso’s dolphins in the Northern Ionian Sea (North-eastern Central Mediterranean). Scientific Reports 8, 1 (2018), 1–11.
    [38]
    Wei-Lung Mao, Wei-Chun Chen, Chien-Tsung Wang, and Yu-Hao Lin. 2021. Recycling waste classification using optimized convolutional neural network. Resources, Conservation and Recycling 164 (2021), 105132.
    [39]
    Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel T. Dudley. 2018. Deep learning for healthcare: Review, opportunities and challenges. Briefings in Bioinformatics 19, 6 (2018), 1236–1246.
    [40]
    Eduardo Mosqueira-Rey, Elena Hernández-Pereira, David Alonso-Ríos, José Bobes-Bascarán, and Ángel Fernández-Leal. 2023. Human-in-the-loop machine learning: A state of the art. Artificial Intelligence Review 56, 4 (2023), 3005–3054.
    [41]
    Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2009), 1345–1359.
    [42]
    Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater, and Kevin Seppi. 2016. Alto: Active learning with topic overviews for speeding label induction and document labeling. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1158–1169.
    [43]
    M. Radeta, Z. Shafieyoun, and M. Maiocchi. 2014. Affective timelines towards the primary-process emotions of movie watchers: Measurements based on self-annotation and affective neuroscience. In 9th International Conference on Design and Emotion, J. Salamanca, P. Desmet, A. Burbano, G. Ludden, and J. Maya (Eds.). Universidad de los Andes, Bogota, Colombia, 679–688.
    [44]
    Marko Radeta, Agustin Zuniga, Naser Hossein Motlagh, Mohan Liyanage, Ruben Freitas, Moustafa Youssef, Sasu Tarkoma, Huber Flores, and Petteri Nurmi. 2022. Deep learning and the oceans. Computer 55, 5 (2022), 39–50.
    [45]
    Rajeev Ranjan, Swami Sankaranarayanan, Ankan Bansal, Navaneeth Bodla, Jun-Cheng Chen, Vishal M. Patel, Carlos D. Castillo, and Rama Chellappa. 2018. Deep learning for understanding faces: Machines may be just as good, or better, than humans. IEEE Signal Processing Magazine 35, 1 (2018), 66–83.
    [46]
    Peter J. Rich and Michael Hannafin. 2009. Video annotation tools: Technologies to scaffold, structure, and transform teacher reflection. Journal of Teacher Education 60, 1 (2009), 52–67.
    [47]
    Eric Saund, Jing Lin, and Prateek Sarkar. 2009. Pixlabeler: User interface for pixel-level labeling of elements in document images. In 2009 10th International Conference on Document Analysis and Recognition. IEEE, 646–650.
    [48]
    Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, and Mickey Aleksic. 2018. A quantization-friendly separable convolution for mobilenets. In 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2 ’18). IEEE, 14–18.
    [49]
    Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 614–622.
    [50]
    Vikash Singh, Celine Latulipe, Erin Carroll, and Danielle Lottridge. 2011. The choreographer’s notebook: A video annotation system for dancers and choreographers. In Proceedings of the 8th ACM Conference on Creativity and Cognition. Association for Computing Machinery, New York, NY, USA, 197–206.
    [51]
    Debjyoti Sinha and Mohamed El-Sharkawy. 2019. Thin MobileNet: An enhanced MobileNet architecture. In 10th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON ’19). IEEE, 280–285. DOI:
    [52]
    Richard Socher, Brody Huval, Bharath Bath, Christopher D. Manning, and Andrew Y. Ng. 2012. Convolutional-recursive deep learning for 3d object classification. In Advances in Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, 656–664.
    [53]
    Mohammad Soleymani and Martha Larson. 2010. Crowdsourcing for affective annotation of video: Development of a viewer-reported boredom corpus. In Workshop on Crowdsourcing for Search Evaluation (SIGIR ’10). ACM.
    [54]
    Jean Y. Song, Stephan J. Lemmer, Michael Xieyang Liu, Shiyan Yan, Juho Kim, Jason J. Corso, and Walter S. Lasecki. 2019. Popup: Reconstructing 3D video using particle filtering to aggregate crowd responses. In Proceedings of the 24th International Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, USA, 558–569.
    [55]
    Yuandong Tian, Wei Liu, Rong Xiao, Fang Wen, and Xiaoou Tang. 2007. A face annotation framework with partial clustering and interactive labeling. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–8.
    [56]
    Douwe van der Wal, Iny Jhun, Israa Laklouk, Jeff Nirschl, Lara Richer, Rebecca Rojansky, Talent Theparee, Joshua Wheeler, Jörg Sander, Felix Feng, et al. 2021. Biological data annotation via a human-augmenting AI-based labeling system. NPJ Digital Medicine 4, 1 (2021), 1–7.
    [57]
    Helena Vasconcelos, Matthew Jörke, Madeleine Grunde-McLaughlin, Tobias Gerstenberg, Michael S. Bernstein, and Ranjay Krishna. 2023. Explanations can reduce overreliance on AI systems during decision-making. Proceedings of ACM Human-Computer Interaction 7, CSCW1, Article 129 (April2023), 38 pages. DOI:
    [58]
    Paul Viola and Michael J. Jones. 2004. Robust real-time face detection. International Journal of Computer Vision 57, 2 (2004), 137–154.
    [59]
    Angelo Vittorio. 2018. Toolkit to Download and Visualize Single or Multiple Classes from the Huge Open Images v4 Dataset. https://github.com/EscVM/OIDv4_ToolKit
    [60]
    Sonia Waharte and Niki Trigoni. 2010. Supporting search and rescue operations with UAVs. In 2010 International Conference on Emerging Security Technologies. IEEE, 142–147.
    [61]
    Dylan Wang, Melody Moh, and Teng-Sheng Moh. 2020. Using deep learning to solve Google reCAPTCHA v2’s image challenges. In 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM ’20). IEEE, 1–5.
    [62]
    Isaac Wang, Pradyumna Narayana, Jesse Smith, Bruce Draper, Ross Beveridge, and Jaime Ruiz. 2018. Easel: Easy automatic segmentation event labeler. In 23rd International Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, USA, 595–599.
    [63]
    Simon Wenkel, Khaled Alhazmi, Tanel Liiv, Saud Alrshoud, and Martin Simon. 2021. Confidence score: The forgotten dimension of object detection performance evaluation. Sensors 21, 13 (2021), 4350.
    [64]
    H. James Wilson and Paul R. Daugherty. 2018. Collaborative intelligence: Humans and AI are joining forces. Harvard Business Review 96, 4 (2018), 114–123.
    [65]
    Xingjiao Wu, Luwei Xiao, Yixuan Sun, Junhang Zhang, Tianlong Ma, and Liang He. 2022. A survey of human-in-the-loop for machine learning. Future Generation Computer Systems 135 (2022), 364–381.
    [66]
    Zhongwen Xu, Linchao Zhu, and Yi Yang. 2017. Few-shot object recognition from machine-labeled web images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1164–1172.
    [67]
    Jie Yang et al. 2003. Automatically labeling video data using multi-class active learning. In Proceedings of the 9th IEEE International Conference on Computer Vision. IEEE, 516–523.
    [68]
    Jiahui Yu, Yuning Jiang, Zhangyang Wang, Zhimin Cao, and Thomas Huang. 2016. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 516–520.
    [69]
    Gang Zhai, Geoffrey C. Fox, Marlon Pierce, Wenjun Wu, and Hasan Bulut. 2005. eSports: Collaborative and synchronous video annotation system in grid computing environment. In 7th IEEE International Symposium on Multimedia (ISM ’05). IEEE, 9–pp.
    [70]
    Neta Zmora, Guy Jacob, Lev Zlotnik, Bar Elharar, and Gal Novik. 2019. Neural network distiller: A Python package for DNN compression research. CoRR abs/1910.12232 (2019). arXiv:1910.12232http://arxiv.org/abs/1910.12232

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Interactive Intelligent Systems
    ACM Transactions on Interactive Intelligent Systems  Volume 14, Issue 2
    June 2024
    201 pages
    ISSN:2160-6455
    EISSN:2160-6463
    DOI:10.1145/3613555
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 April 2024
    Online AM: 29 February 2024
    Accepted: 30 January 2024
    Revised: 27 January 2024
    Received: 07 April 2023
    Published in TIIS Volume 14, Issue 2

    Check for updates

    Author Tags

    1. Computer vision
    2. object detection
    3. machine learning
    4. deep learning
    5. annotation
    6. videos
    7. man-machine
    8. human-in-the-loop
    9. intelligent user interface
    10. AI-assisted interface

    Qualifiers

    • Research-article

    Funding Sources

    • Foundation for Science and Technology (FCT): INTERWHALE - Advancing Interactive Technology for Responsible Whale-Watching
    • Foundation for Science and Technology (FCT): MARE - The Marine and Environmental Sciences Centre
    • Foundation for Science and Technology (FCT): ARNET - Aquatic Research Network
    • Foundation for Science and Technology (FCT): PhD scholarship
    • EU Horizon Europe project CLIMAREST: Coastal Climate Resilience and Marine Restoration Tools for the Arctic Atlantic basin
    • Academy of Finland
    • European Social Fund via “ICT programme” measure, Estonian Center of Excellence in ICT Research
    • Nokia Foundation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 235
      Total Downloads
    • Downloads (Last 12 months)235
    • Downloads (Last 6 weeks)35
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media