skip to main content
research-article

Bug Analysis in Jupyter Notebook Projects: An Empirical Study

Published: 18 April 2024 Publication History

Abstract

Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, few studies have been found to understand Jupyter development challenges from the practitioners’ point of view. This article presents a systematic study of bugs and challenges that Jupyter practitioners face through a large-scale empirical investigation. We mined 14,740 commits from 105 GitHub open source projects with Jupyter Notebook code. Next, we analyzed 30,416 StackOverflow posts, which gave us insights into bugs that practitioners face when developing Jupyter Notebook projects. Next, we conducted 19 interviews with data scientists to uncover more details about Jupyter bugs and to gain insight into Jupyter developers’ challenges. Finally, to validate the study results and proposed taxonomy, we conducted a survey with 91 data scientists. We highlight bug categories, their root causes, and the challenges that Jupyter practitioners face.

References

[1]
Amritanshu Agrawal, Akond Rahman, Rahul Krishna, Alexander Sobran, and Tim Menzies. 2018. We don’t need another hero? The impact of “heroes” on software development. In Proceedings of the 40th International Conference on Software Engineering:Software Engineering in Practice (ICSE’18). ACM, 245–253. DOI:
[2]
Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). 487–499.
[3]
Andrew Begel and Thomas Zimmermann. 2014. Analyze this! 145 questions for data scientists in software engineering. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14). ACM, 12–23. DOI:
[4]
Longbing Cao. 2017. Data science:A comprehensive overview. ACM Comput. Surv. 50, 3 (June2017), 1–42.
[5]
Souti Chattopadhyay, Ishita Prasad, Austin Z. Henley, Anita Sarma, and Titus Barik. 2020. What’s wrong with computational notebooks? Pain points, needs, and design opportunities. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’20). ACM, 1–12.
[6]
Vasant Dhar. 2013. Data science and prediction. Commun. ACM 56, 12 (Dec. 2013), 64–73.
[7]
P. Fusch and L. Ness. 2015. Are we there yet? Data saturation in qualitative research. In Qualitative Report. Nova Southeastern University, Minneapolis, MN, 1408–1416.
[8]
Joshua Garcia, Yang Feng, Junjie Shen, Sumaya Almanee, Yuan Xia, and Qi Alfred Chen. 2020. A comprehensive study of autonomous vehicle bugs. In Proceedings of the 42nd International Conference on Software Engineering (ICSE’20). ACM, 385–396.
[9]
Andrew Head, Fred Hohman, Titus Barik, Steven Mark Drucker, and Robert DeLine. 2019. Managing messes in computational notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, 270. DOI:
[10]
Md. Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’19). ACM, 510–520.
[11]
Project Jupyter. 2015. Project Jupyter:Computational Narratives as the Engine of Collaborative Data Science. Retrieved January 26, 2024 from https://blog.jupyter.org/project-jupyter-computational-narratives-as-the-engine-of-collaborative-data-science-2b5fb94c3c58
[12]
Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2012. Enterprise data analysis and visualization:An interview study. IEEE Trans. Vis. Comput. Graph. 18, 12 (2012), 2917–2926.
[13]
Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E. John, and Brad A. Myers. 2018. The story in the notebook:Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI’18). ACM, 174. DOI:
[14]
Mary Beth Kery, Donghao Ren, Fred Hohman, Dominik Moritz, Kanit Wongsuphasawat, and Kayur Patel. 2020. Mage:Fluid moves between code and graphical work in computational notebooks. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (UIST’20). ACM, 140–151. DOI:
[15]
Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientists on software development teams. In Proceedings of the 38th International Conference on Software Engineering(ICSE’16). ACM, 96–107. DOI:
[16]
Barbara A. Kitchenham and Shari L. Pfleeger. 2008. Personal opinion surveys. In Guide to Advanced Empirical Software Engineering, Forrest Shull, Janice Singer, and Dag I. K. Sjøberg (Eds.). Springer London, 63–92.
[17]
Andreas P. Koenzen, Neil A. Ernst, and Margaret-Anne D. Storey. 2020. Code duplication and reuse in Jupyter Notebooks. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/IICC’20). IEEE, 1–9.
[18]
Rahul Krishna, Amritanshu Agrawal, Akond Rahman, Alexander Sobran, and Tim Menzies. 2018. What is the connection between issues, bugs, and enhancements? Lessons learned from 800+ software projects. In Proceedings of the 40th International Conference on Software Engineering:Software Engineering in Practice (ICSE’18). ACM, 306–315. DOI:
[19]
J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159–174.
[20]
Amir Makhshari and Ali Mesbah. 2021. IoT bugs and development challenges. In Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE’21). IEEE, 460–472.
[21]
Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2016. Curating GitHub for engineered software projects. PeerJ Prepr. 4 (2016), e2617. DOI:
[22]
Jibesh Patra and Michael Pradel. 2021. Nalin:Learning from runtime behavior to find name-value inconsistencies in Jupyter Notebooks. In Proceedings of the 44rd IEEE/ACM International Conference on Software Engineering (ICSE’22). ACM.
[23]
João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2019. A large-scale study about quality and reproducibility of Jupyter Notebooks. In Proceedings of the 16th International Conference on Mining Software Repositories (MSR’19). IEEE, 507–517.
[24]
João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2021. Understanding and improving the quality and reproducibility of Jupyter Notebooks. Empir. Softw. Eng. 26, 4 (2021), 65. DOI:
[25]
Akond Rahman, Amritanshu Agrawal, Rahul Krishna, and Alexander Sobran. 2018. Characterizing the influence of continuous integration:Empirical results from 250+ open source and proprietary projects. In Proceedings of the 4th ACM SIGSOFT International Workshop on Software Analytics (SWAN@ESEC/SIGSOFT FSE’18). ACM, 8–14. DOI:
[26]
Akond Rahman, Effat Farhana, Chris Parnin, and Laurie Williams. 2020. Gang of eight:A defect taxonomy for infrastructure as code scripts. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering(ICSE’20). ACM, 752–764.
[27]
Adam Rule, Ian Drosos, Aurélien Tabard, and James D. Hollan. 2018a. Aiding collaborative reuse of computational notebooks with annotated cell folding. Proc. ACM Hum.-Comput. Interact. 2, CSCW (Nov. 2018), Article 150, 12 pages. DOI:
[28]
Adam Rule, Aurélien Tabard, and James D. Hollan. 2018b. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI’18). ACM, 32.
[29]
Adam Rule, Aurélien Tabard, and James D. Hollan. 2018c. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI’18). ACM, 1–12.
[30]
J. Saldaña. 2009. The Coding Manual for Qualitative Researchers. SAGE. 01475499https://books.google.com.br/books?id=OE7LngEACAAJ
[31]
J. Saldana. 2015. The Coding Manual for Qualitative Researchers. SAGE.
[32]
I. Seidman. 2006. Interviewing as Qualitative Research:A Guide for Researchers in Education and the Social Sciences (3rd ed.). Teachers College Press.
[33]
Olivier Serrat. 2017. The five whys technique. In Knowledge Solutions. Springer Singapore, 307–310. DOI:
[34]
Seyyed Ehsan Salamati Taba, Foutse Khomh, Ying Zou, Ahmed E. Hassan, and Meiyappan Nagappan. 2013. Predicting bugs using antipatterns. In Proceedings of the 2013 IEEE International Conference on Software Maintenance. 270–279. DOI:
[35]
Yida Tao, Jiefang Jiang, Yepang Liu, Zhiwu Xu, and Shengchao Qin. 2020. Understanding performance concerns in the API documentation of data science libraries. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20). ACM, 895–906.
[36]
Ferdian Thung, Shaowei Wang, David Lo, and Lingxiao Jiang. 2012. An empirical study of bugs in machine learning systems. In Proceedings of the 23rd IEEE International Symposium on Software Reliability Engineering (ISSRE’12). IEEE, 271–280.
[37]
April Yi Wang, Anant Mittal, Christopher Brooks, and Steve Oney. 2019. How data scientists use computational notebooks for real-time collaboration. Proc. ACM Hum.-Comput. Interact. 3, CSCW (Nov. 2019), Article 39, 30 pages. DOI:
[38]
Dinghua Wang, Shuqing Li, Guanping Xiao, Yepang Liu, and Yulei Sui. 2021b. An exploratory study of autopilot software bugs in unmanned aerial vehicles. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC/FSE’21). ACM, 20–31.
[39]
Jiawei Wang, Tzu-Yang Kuo, Li Li, and Andreas Zeller. 2020a. Assessing and restoring reproducibility of Jupyter Notebooks. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20). ACM, 138–149.
[40]
Jiawei Wang, Li Li, and Andreas Zeller. 2020b. Better code, better sharing:On the need of analyzing Jupyter Notebooks. In Proceedings of the 42nd International Conference on Software Engineering, New Ideas, and Emerging Results (ICSE-NIER’20). ACM, 53–56.
[41]
Jiawei Wang, Li Li, and Andreas Zeller. 2020c. Better code, better sharing:On the need of analyzing Jupyter Notebooks. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering:New Ideas and Emerging Results(ICSE/NIER’20). ACM, 53–56. DOI:
[42]
Jiawei Wang, Li Li, and Andreas Zeller. 2021a. Restoring execution environments of Jupyter Notebooks. In Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE’21). IEEE, 1622–1633.
[43]
Chenyang Yang, Shurui Zhou, Jin L. C. Guo, and Christian Kästner. 2021. Subtle bugs everywhere:Generating documentation for data wrangling code. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, 304–316. DOI:
[44]
Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis(ISSTA’18). ACM, 129–140.

Cited By

View all
  • (2024)Multiverse Notebook: Shifting Data Scientists to Time TravelersProceedings of the ACM on Programming Languages10.1145/36498388:OOPSLA1(754-783)Online publication date: 29-Apr-2024
  • (2024)Threats to Instrument Validity Within “in Silico” Research: Software Engineering to the RescueLeveraging Applications of Formal Methods, Verification and Validation. Software Engineering Methodologies10.1007/978-3-031-75387-9_6(82-96)Online publication date: 26-Oct-2024

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 4
May 2024
940 pages
EISSN:1557-7392
DOI:10.1145/3613665
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2024
Online AM: 22 January 2024
Accepted: 03 January 2024
Revised: 20 December 2023
Received: 11 October 2022
Published in TOSEM Volume 33, Issue 4

Check for updates

Author Tags

  1. Jupyter Notebooks
  2. bugs
  3. interviews
  4. mining software repositories (MSR)
  5. StackOverflow
  6. empirical study

Qualifiers

  • Research-article

Funding Sources

  • INES, CNPq
  • CAPES
  • FACEPE
  • PRONEX
  • FAPESB INCITE

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)345
  • Downloads (Last 6 weeks)65
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Multiverse Notebook: Shifting Data Scientists to Time TravelersProceedings of the ACM on Programming Languages10.1145/36498388:OOPSLA1(754-783)Online publication date: 29-Apr-2024
  • (2024)Threats to Instrument Validity Within “in Silico” Research: Software Engineering to the RescueLeveraging Applications of Formal Methods, Verification and Validation. Software Engineering Methodologies10.1007/978-3-031-75387-9_6(82-96)Online publication date: 26-Oct-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media