research-article

Free access

Large Language Models for Equivalent Mutant Detection: How Far Are We?

Authors:

Yasutaka Kamei,

Junjie ChenAuthors Info & Claims

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 1733 - 1745

https://doi.org/10.1145/3650212.3680395

Published: 11 September 2024 Publication History

Abstract

Mutation testing is vital for ensuring software quality. However, the presence of equivalent mutants is known to introduce redundant cost and bias issues, hindering the effectiveness of mutation testing in practical use. Although numerous equivalent mutant detection (EMD) techniques have been proposed, they exhibit limitations due to the scarcity of training data and challenges in generalizing to unseen mutants. Recently, large language models (LLMs) have been extensively adopted in various code-related tasks and have shown superior performance by more accurately capturing program semantics. Yet the performance of LLMs in equivalent mutant detection remains largely unclear. In this paper, we conduct an empirical study on 3,302 method-level Java mutant pairs to comprehensively investigate the effectiveness and efficiency of LLMs for equivalent mutant detection. Specifically, we assess the performance of LLMs compared to existing EMD techniques, examine the various strategies of LLMs, evaluate the orthogonality between EMD techniques, and measure the time overhead of training and inference. Our findings demonstrate that LLM-based techniques significantly outperform existing techniques (i.e., the average improvement of 35.69% in terms of F1-score), with the fine-tuned code embedding strategy being the most effective. Moreover, LLM-based techniques offer an excellent balance between cost (relatively low training and inference time) and effectiveness. Based on our findings, we further discuss the impact of model size and embedding quality, and provide several promising directions for future research. This work is the first to examine LLMs in equivalent mutant detection, affirming their effectiveness and efficiency.

References

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, and Shyamal Anadkat. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774.

[2]

Konstantinos Adamopoulos, Mark Harman, and Robert M Hierons. 2004. How to overcome the equivalent mutant problem and achieve tailored selective mutation using co-evolution. In Genetic and Evolutionary Computation–GECCO 2004: Genetic and Evolutionary Computation Conference, Seattle, WA, USA, June 26-30, 2004. Proceedings, Part II. 1338–1349.

[3]

Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333.

[4]

Toufique Ahmed, Christian Bird, Premkumar Devanbu, and Saikat Chakraborty. 2024. Studying LLM Performance on Closed-and Open-source Data. arXiv preprint arXiv:2402.15100.

[5]

James H Andrews, Lionel C Briand, and Yvan Labiche. 2005. Is mutation an appropriate tool for testing experiments? In Proceedings of the 27th international conference on Software engineering. 402–411.

Digital Library

[6]

James H Andrews, Lionel C Briand, Yvan Labiche, and Akbar Siami Namin. 2006. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Software Engineering, 32, 8 (2006), 608–624.

Digital Library

[7]

Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, and Zhifeng Chen. 2023. Palm 2 technical report. arXiv preprint arXiv:2305.10403.

[8]

Paolo Arcaini, Angelo Gargantini, Elvinia Riccobene, and Paolo Vavassori. 2017. A novel use of equivalent mutants for static anomaly detection in software artifacts. Information and Software Technology, 81 (2017), 52–64.

Digital Library

[9]

Michael Baer, Norbert Oster, and Michael Philippsen. 2020. Mutantdistiller: Using symbolic execution for automatic detection of equivalent mutants and generation of mutant killing tests. In 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 294–303.

[10]

Ezio Bartocci, Leonardo Mariani, Dejan Ničković, and Drishti Yadav. 2023. Property-based mutation testing. In 2023 IEEE Conference on Software Testing, Verification and Validation (ICST). 222–233.

[11]

Claudinei Brito, Vinicius HS Durelli, Rafael S Durelli, Simone RS de Souza, Auri MR Vincenzi, and Márcio Eduardo Delamaro. 2020. A preliminary investigation into using machine learning algorithms to identify minimal and equivalent mutants. In 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 304–313.

[12]

Nadia Burkart and Marco F Huber. 2021. A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research, 70 (2021), 245–317.

Digital Library

[13]

Cristiano Cervellera and Danilo Macciò. 2017. Distribution-preserving stratified sampling for learning problems. IEEE Transactions on Neural Networks and Learning Systems, 29, 7 (2017), 2886–2895.

[14]

Seungjoon Chung and Shin Yoo. 2022. Augmenting Equivalent Mutant Dataset Using Symbolic Execution. In 2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 150–159.

[15]

Xavier Devroey, Gilles Perrouin, Mike Papadakis, Axel Legay, Pierre-Yves Schobbens, and Patrick Heymans. 2018. Model-based mutant equivalence detection using automata language equivalence and simulations. Journal of Systems and Software, 141 (2018), 1–15.

[16]

Yali Du and Zhongxing Yu. 2023. Pre-training code representation with semantic flow graph for effective bug localization. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 579–591.

Digital Library

[17]

Sidong Feng and Chunyang Chen. 2024. Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–13.

Digital Library

[18]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155.

[19]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.

[20]

Rohit Gheyi, Márcio Ribeiro, Beatriz Souza, Marcio Guimarães, Leo Fernandes, Marcelo d’Amorim, Vander Alves, Leopoldo Teixeira, and Baldoino Fonseca. 2021. Identifying method-level mutation subsumption relations using Z3. Information and Software Technology, 132 (2021), 106496.

[21]

Dan Gong, Tiantian Wang, Xiaohong Su, and Yanhang Zhang. 2022. Equivalent mutants detection based on weighted software behavior graph. International Journal of Software Engineering and Knowledge Engineering, 32, 06 (2022), 819–843.

[22]

Rahul Gopinath, Carlos Jensen, and Alex Groce. 2014. Mutations: How close are they to real faults? In 2014 IEEE 25th International Symposium on Software Reliability Engineering. 189–200.

Digital Library

[23]

Rahul Gopinath, Björn Mathis, and Andreas Zeller. 2018. If You Can’t Kill a Supermutant, You Have a Problem. In 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 18–24.

[24]

Marcio Augusto Guimarães, Leo Fernandes, Márcio Ribeiro, Marcelo d’Amorim, and Rohit Gheyi. 2020. Optimizing mutation testing by discovering dynamic mutant subsumption relations. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST). 198–208.

[25]

Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. Unixcoder: Unified cross-modal pre-training for code representation. arXiv preprint arXiv:2203.03850.

[26]

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, and Shengyu Fu. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366.

[27]

Mark Harman, Rob Hierons, and Sebastian Danicic. 2001. The relationship between program dependence and mutation analysis. Mutation testing for the new century, 5–13.

[28]

Dominik Holling, Sebastian Banescu, Marco Probst, Ana Petrovska, and Alexander Pretschner. 2016. Nequivack: Assessing mutation score confidence. In 2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 152–161.

[29]

Homepage. 2024. https://github.com/tianzhaotju/EMD

[30]

Mahdi Houshmand and Samad Paydar. 2017. TCE+: An extension of the tce method for detecting equivalent mutants in java programs. In Fundamentals of Software Engineering: 7th International Conference, FSEN 2017, Tehran, Iran, April 26–28, 2017, Revised Selected Papers 7. 164–179.

[31]

Kai Huang, Xiangxin Meng, Jian Zhang, Yang Liu, Wenjie Wang, Shuhao Li, and Yuqing Zhang. 2023. An empirical study on fine-tuning large language models of code for automated program repair. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1162–1174.

[32]

Yue Jia and Mark Harman. 2009. Higher order mutation testing. Information and Software Technology, 51, 10 (2009), 1379–1393.

Digital Library

[33]

Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering, 37, 5 (2010), 649–678.

[34]

Xue Jiang, Yihong Dong, Lecheng Wang, Qiwei Shang, and Ge Li. 2023. Self-planning code generation with large language model. arXiv preprint arXiv:2303.06689.

[35]

Mohamad Khajezade, Jie Wu, Fatemeh Hendijani Fard, Gema Rodríguez-Pérez, and Mohamed Sami Shehata. 2024. Investigating the Efficacy of Large Language Models for Code Clone Detection. arXiv preprint arXiv:2401.13802.

[36]

Jinhan Kim, Juyoung Jeon, Shin Hong, and Shin Yoo. 2022. Predictive mutation analysis via the natural language channel in source code. ACM Transactions on Software Engineering and Methodology (TOSEM), 31, 4 (2022), 1–27.

Digital Library

[37]

Marinos Kintis, Mike Papadakis, Yue Jia, Nicos Malevris, Yves Le Traon, and Mark Harman. 2017. Detecting trivial mutant equivalences via compiler optimisations. IEEE Transactions on Software Engineering, 44, 4 (2017), 308–333.

[38]

Marinos Kintis, Mike Papadakis, Andreas Papadopoulos, Evangelos Valvis, and Nicos Malevris. 2016. Analysing and comparing the effectiveness of mutation testing tools: A manual study. In 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM). 147–156.

[39]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35 (2022), 22199–22213.

[40]

William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One-Criterion Variance Analysis. J. Amer. Statist. Assoc., 47, 260 (1952), 583–621. issn:01621459

[41]

Benjamin Kushigian, Amit Rawat, and René Just. 2019. Medusa: Mutant equivalence detection using satisfiability analysis. In 2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 77–82.

[42]

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, and Jenny Chim. 2023. StarCoder: may the source be with you!. arXiv preprint arXiv:2305.06161.

[43]

Tsz-On Li, Wenxi Zong, Yibo Wang, Haoye Tian, Ying Wang, Shing-Chi Cheung, and Jeff Kramer. 2023. Nuances are the key: Unlocking chatgpt to find failure-inducing tests with differential prompting. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 14–26.

[44]

Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. 2022. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35 (2022), 4328–4343.

[45]

Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2024. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems, 36 (2024).

[46]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys, 55, 9 (2023), 1–35.

Digital Library

[47]

Yiling Lou, Dan Hao, and Lu Zhang. 2015. Mutation-based test-case prioritization in software evolution. In 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE). 46–57.

Digital Library

[48]

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, and Duyu Tang. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664.

[49]

Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa, and Dean Foster. 2021. Variance reduced training with stratified sampling for forecasting models. In International Conference on Machine Learning. 7145–7155.

[50]

Wei Ma, Shangqing Liu, Wenhan Wang, Qiang Hu, Ye Liu, Cen Zhang, Liming Nie, and Yang Liu. 2023. The scope of chatgpt in software engineering: A thorough investigation. arXiv preprint arXiv:2305.12138.

[51]

Lech Madeyski, Wojciech Orzeszyna, Richard Torkar, and Mariusz Jozala. 2013. Overcoming the equivalent mutant problem: A systematic literature review and a comparative experiment of second order mutation. IEEE Transactions on Software Engineering, 40, 1 (2013), 23–42.

Digital Library

[52]

Mohsen Moradi Moghadam, Mehdi Bagherzadeh, Raffi Khatchadourian, and Hamid Bagheri. 2023. muAkka: Mutation Testing for Actor Concurrency in Akka using Real-World Bugs. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 262–274.

[53]

Muhammad Rashid Naeem, Tao Lin, Hamad Naeem, and Hailu Liu. 2020. A machine learning approach for classification of equivalent mutants. Journal of Software: Evolution and Process, 32, 5 (2020), e2238.

Digital Library

[54]

Changan Niu, Chuanyi Li, Vincent Ng, Dongxiao Chen, Jidong Ge, and Bin Luo. 2023. An empirical comparison of pre-trained models of source code. arXiv preprint arXiv:2302.04026.

[55]

A Jefferson Offutt and Jie Pan. 1997. Automatically detecting equivalent mutants and infeasible paths. Software testing, verification and reliability, 7, 3 (1997), 165–192.

[56]

Saeyoon Oh, Seongmin Lee, and Shin Yoo. 2021. Effectively sampling higher order mutants using causal effect. In 2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 19–24.

[57]

Milos Ojdanic, Ezekiel Soremekun, Renzo Degiovanni, Mike Papadakis, and Yves Le Traon. 2023. Mutation testing in evolving systems: Studying the relevance of mutants to code evolution. ACM Transactions on Software Engineering and Methodology, 32, 1 (2023), 1–39.

Digital Library

[58]

OpenAI. 2022. ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt

[59]

OpenAI. 2024. https://openai.com/

[60]

OpenAI. 2024. New Generation of Embedding Model. https://openai.com/blog/new-embedding-models-and-api-updates

[61]

Mike Papadakis, Marcio Delamaro, and Yves Le Traon. 2014. Mitigating the effects of equivalent mutants with mutant classification strategies. Science of Computer Programming, 95 (2014), 298–319.

Digital Library

[62]

Mike Papadakis, Yue Jia, Mark Harman, and Yves Le Traon. 2015. Trivial compiler equivalence: A large scale empirical study of a simple, fast and effective equivalent mutant detection technique. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 1, 936–946.

[63]

Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Mutation testing advances: an analysis and survey. In Advances in computers. 112, Elsevier, 275–378.

[64]

Mike Papadakis and Yves Le Traon. 2013. Mutation testing strategies using mutant classification. In Proceedings of the 28th Annual ACM Symposium on Applied Computing. 1223–1229.

Digital Library

[65]

Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: mutation-based fault localization. Software Testing, Verification and Reliability, 25, 5-7 (2015), 605–628.

Digital Library

[66]

Samuel Peacock, Lin Deng, Josh Dehlinger, and Suranjan Chakraborty. 2021. Automatic equivalent mutants classification using abstract syntax tree neural networks. In 2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 13–18.

[67]

James Perretta, Andrew DeOrio, Arjun Guha, and Jonathan Bell. 2022. On the use of mutation analysis for evaluating student test suite quality. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 263–275.

Digital Library

[68]

Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–25.

Digital Library

[69]

Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, and Jérémy Rapin. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.

[70]

June Sallou, Thomas Durieux, and Annibale Panichella. 2024. Breaking the silence: the threats of using llms in software engineering. In Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results. 102–106.

Digital Library

[71]

Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2023. An empirical evaluation of using large language models for automated unit test generation. IEEE Transactions on Software Engineering.

[72]

David Schuler and Andreas Zeller. 2010. (Un-) covering equivalent mutants. In 2010 Third International Conference on Software Testing, Verification and Validation. 45–54.

Digital Library

[73]

David Schuler and Andreas Zeller. 2013. Covering and uncovering equivalent mutants. Software Testing, Verification and Reliability, 23, 5 (2013), 353–374.

[74]

August Shi, Jonathan Bell, and Darko Marinov. 2019. Mitigating the effects of flaky tests on mutation testing. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 112–122.

Digital Library

[75]

Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M Sadler, Wei-Lun Chao, and Yu Su. 2023. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2998–3009.

[76]

Zhao Tian and Junjie Chen. 2023. Test-case-driven programming understanding in large language models for better code generation. arXiv preprint arXiv:2309.16120.

[77]

Zhao Tian, Junjie Chen, and Zhi Jin. 2023. Code difference guided adversarial example generation for deep code models. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 850–862.

[78]

Zhao Tian, Junjie Chen, and Xiangyu Zhang. 2023. On-the-fly improving performance of deep code models via input denoising. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 560–572.

[79]

Zhao Tian, Junjie Chen, Qihao Zhu, Junjie Yang, and Lingming Zhang. 2022. Learning to construct better mutation faults. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.

Digital Library

[80]

Thierry Titcheu Chekam, Mike Papadakis, Tegawendé F Bissyandé, Yves Le Traon, and Koushik Sen. 2020. Selecting fault revealing mutants. Empirical Software Engineering, 25 (2020), 434–487.

Digital Library

[81]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, and Faisal Azhar. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.

[82]

Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2019. Learning how to mutate source code from bug-fixes. In 2019 IEEE International conference on software maintenance and evolution (ICSME). 301–312.

[83]

Laurens van der Maaten and Geoffrey E. Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research.

[84]

Lars van Hijfte and Ana Oprescu. 2021. Mutantbench: an equivalent mutant problem comparison framework. In 2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 7–12.

[85]

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software testing with large language models: Survey, landscape, and vision. IEEE Transactions on Software Engineering.

Digital Library

[86]

Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi. 2023. Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922.

[87]

Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859.

[88]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35 (2022), 24824–24837.

[89]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, and Morgan Funtowicz. 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.

[90]

Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2023. Universal fuzzing via large language models. arXiv preprint arXiv:2308.04748.

[91]

Chen Yang, Junjie Chen, Bin Lin, Jianyi Zhou, and Ziqi Wang. 2024. Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis. arXiv preprint arXiv:2404.04966.

[92]

Lin Yang, Chen Yang, Shutao Gao, Weijing Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, and Qianxiang Wang. 2024. An Empirical Study of Unit Test Generation with Large Language Models. arXiv preprint arXiv:2406.18181.

[93]

Xiangjuan Yao, Mark Harman, and Yue Jia. 2014. A study of equivalent and stubborn mutation operators using human analysis of equivalence. In Proceedings of the 36th international conference on software engineering. 919–930.

Digital Library

[94]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A novel neural source code representation based on abstract syntax tree. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 783–794.

Digital Library

[95]

Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, and David Lo. 2023. Revisiting sentiment analysis for software engineering in the era of large language models. arXiv preprint arXiv:2310.11113.

Index Terms

Large Language Models for Equivalent Mutant Detection: How Far Are We?
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

To Kill a Mutant: An Empirical Study of Mutation Testing Kills
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

Mutation testing has been used and studied for over four decades as a method to assess the strength of a test suite. This technique adds an artificial bug (i.e., a mutation) to a program to produce a mutant, and the test suite is run to determine if ...
Effective test generation using pre-trained Large Language Models and mutation testing
Abstract Context:
One of the critical phases in the software development life cycle is software testing. Testing helps with identifying potential bugs and reducing maintenance costs. The goal of automated test generation tools is to ease the development ...
The Impact of Equivalent Mutants
ICSTW '09: Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation Workshops

If a mutation is not killed by a test suite, this usuallymeans that the test suite is not adequate. However, itmay also be that the mutant keeps the program’s seman-tics unchanged—and thus cannot be detected by any test.We found such equivalent mutants ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 2024

1928 pages

ISBN:9798400706127

DOI:10.1145/3650212

General Chair:
Maria Christakis
TU Wien, Austria
,
Program Chair:
Michael Pradel
University of Stuttgart, Germany

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Distinguished Paper

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
CCF Young Elite Scientists Sponsorship Program (by CAST)
JSPS for the KAKENHI
Bilateral Program
the Inamori Research Institute for Science for supporting Yasutaka Kamei via the InaRIS Fellowship

Conference

ISSTA '24

Sponsor:

SIGSOFT

ISSTA '24: 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
13
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)13

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents