skip to main content
10.1145/3238147.3238190acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Neural-machine-translation-based commit message generation: how far are we?

Published: 03 September 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test <pre>diffs</pre> from which NMT can generate high-quality messages are similar to one or more training <pre>diffs</pre> at the token level. (2) About 16% of the commit messages in Jiang et al.’s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers.

    References

    [1]
    2017. Jiang et al.’s website. https://sjiang1.github.io/commitgen/. 2018. Git. https://git-scm.com/. 2018. Our online appendix. https://goo.gl/63B976. 2018. ExoPlayer. https://github.com/google/ExoPlayer. 2018. Google Closure Compiler. https://github.com/google/closure-compiler. 2018. Liferay Portal. https://github.com/liferay/liferay-portal. 2018. Stack Overflow. https://stackoverflow.com/. 2018. TsExtractor in ExoPlayer. https://goo.gl/Dsbdjf.
    [2]
    Nahla J Abid, Natalia Dragan, Michael L Collard, and Jonathan I Maletic. 2015. Using stereotypes in the automatic generation of natural language summaries for c++ methods. In Software Maintenance and Evolution (ICSME), 2015 IEEE International Conference on. IEEE, 561–565.
    [3]
    Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 38–49.
    [4]
    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
    [5]
    Raymond PL Buse and Westley R Weimer. 2008. Automatic documentation inference for exceptions. In Proceedings of the 2008 international symposium on Software testing and analysis. ACM, 273–282.
    [6]
    Raymond PL Buse and Westley R Weimer. 2010. Automatically documenting program changes. In Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, 33–42.
    [7]
    Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
    [8]
    Jacob Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin 70, 4 (1968), 213.
    [9]
    Luis Fernando Cortés-Coy, Mario Linares-Vásquez, Jairo Aponte, and Denys Poshyvanyk. 2014. On Automatically Generating Commit Messages via Summarization of Source Code Changes. In IEEE International Working Conference on Source Code Analysis and Manipulation. 275–284.
    [10]
    Sergio Cozzetti B de Souza, Nicolas Anquetil, and Kathia M de Oliveira. 2005. A study of the documentation essential to software maintenance. In Proceedings of the 23rd annual international conference on Design of communication: documenting &amp; designing for pervasive information. ACM, 68–75.
    [11]
    Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 422–431.
    [12]
    Wei Fu and Tim Menzies. 2017. Easy over hard: a case study on deep learning. In Joint Meeting on Foundations of Software Engineering. 49–60.
    [13]
    Ying Fu, Meng Yan, Xiaohong Zhang, Ling Xu, Dan Yang, and Jeffrey D Kymer. 2015. Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation. Information and Software Technology 57 (2015), 369–377.
    [14]
    Baljinder Ghotra, Shane McIntosh, and Ahmed E Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 789–800.
    [15]
    Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In Reverse Engineering (WCRE), 2010 17th Working Conference on. IEEE, 35–44.
    [16]
    Ahmed E Hassan. 2008. Automated classification of change messages in open source projects. In Proceedings of the 2008 ACM symposium on Applied computing. ACM, 837–841.
    [17]
    Ahmed E Hassan and Richard C Holt. 2004. Using development history sticky notes to understand software architecture. In Program Comprehension, 2004. Proceedings. 12th IEEE International Workshop on. IEEE, 183–192.
    [18]
    Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE, 837–847.
    [19]
    Jerry L Hintze and Ray D Nelson. 1998. Violin plots: a box plot-density trace synergism. The American Statistician 52, 2 (1998), 181–184.
    [20]
    Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension, ICPC 2018, Gothenburg, Sweden, May 27-28, 2018. 200–210. 1145/3196321.3196334
    [21]
    Yuan Huang, Qiaoyang Zheng, Xiangping Chen, Yingfei Xiong, Zhiyong Liu, and Xiaonan Luo. 2017. Mining Version Control System for Automatically Generating Commit Comment. In Empirical Software Engineering and Measurement (ESEM), 2017 ACM/IEEE International Symposium on. IEEE, 414–423.
    [22]
    Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 2073–2083.
    [23]
    Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 135–146.
    [24]
    Siyuan Jiang and Collin McMillan. 2017. Towards automatic generation of short summaries of commits. In Proceedings of the 25th International Conference on Program Comprehension. IEEE Press, 320–323.
    [25]
    Mira Kajko-Mattsson. 2005. A survey of documentation practice within corrective maintenance. Empirical Software Engineering 10, 1 (2005), 31–55.
    [26]
    Manabu Kamimura and Gail C Murphy. 2013. Towards generating humanoriented summaries of unit test cases. In Program Comprehension (ICPC), 2013 IEEE 21st International Conference on. IEEE, 215–218.
    [27]
    Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ expectations on automated fault localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 165–176.
    [28]
    Jan-Peter Krämer, Joel Brandt, and Jan Borchers. 2016. Using runtime traces to improve documentation and unit test authoring for dynamic languages. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 3232–3237.
    [29]
    Tien-Duy B Le, Jooyong Yi, David Lo, Ferdian Thung, and Abhik Roychoudhury. 2014. Dynamic inference of change contracts. In Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on. IEEE, 451–455.
    [30]
    Boyang Li, Christopher Vendome, Mario Linares-Vásquez, Denys Poshyvanyk, and Nicholas A Kraft. 2016. Automatically documenting unit test cases. In Software Testing, Verification and Validation (ICST), 2016 IEEE International Conference on. IEEE, 341–352.
    [31]
    Mario Linares-Vásquez, Luis Fernando Cortés-Coy, Jairo Aponte, and Denys Poshyvanyk. 2015. Changescribe: A tool for automatically generating commit messages. In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, Vol. 2. IEEE, 709–712.
    [32]
    Mario Linares-Vásquez, Boyang Li, Christopher Vendome, and Denys Poshyvanyk. 2016. Documenting database usages and schema constraints in databasecentric applications. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 270–281.
    [33]
    Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. 55–60.
    [34]
    Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, et al. 2008. Introduction to information retrieval. Vol. 1. Cambridge university press Cambridge.
    [35]
    Paul W McBurney and Collin McMillan. 2014. Automatic documentation generation via source code summarization of method context. In Proceedings of the 22nd International Conference on Program Comprehension. ACM, 279–290.
    [36]
    Paul W McBurney and Collin McMillan. 2016. Automatic source code summarization of context for java methods. IEEE Transactions on Software Engineering 42, 2 (2016), 103–119.
    [37]
    Audris Mockus and Lawrence G Votta. 2000. Identifying Reasons for Software Changes using Historic Databases. In icsm. 120–130.
    [38]
    Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K Vijay-Shanker. 2013. Automatic generation of natural language summaries for java classes. In Program Comprehension (ICPC), 2013 IEEE 21st International Conference on. IEEE, 23–32.
    [39]
    Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Andrian Marcus, and Gerardo Canfora. 2014. Automatic generation of release notes. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 484–495.
    [40]
    Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Andrian Marcus, and Gerardo Canfora. 2017. ARENA: an approach for the automated generation of release notes. IEEE Transactions on Software Engineering 43, 2 (2017), 106–127.
    [41]
    Laura Moreno, Andrian Marcus, Lori Pollock, and K Vijay-Shanker. 2013. Jsummarizer: An automatic generator of natural language summaries for java classes. In Program Comprehension (ICPC), 2013 IEEE 21st International Conference on. IEEE, 230–232.
    [42]
    David G Novick and Karen Ward. 2006. What users say they want in documentation. In Proceedings of the 24th annual ACM international conference on Design of communication. ACM, 84–91.
    [43]
    Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Learning to generate pseudo-code from source code using statistical machine translation (t). In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. IEEE, 574– 584.
    [44]
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311–318. ASE ’18, September 3–7, 2018, Montpellier, France Liu, Xia, Hassan, Lo, Xing, and Wang
    [45]
    Hung Phan, Hoan Anh Nguyen, Tien N Nguyen, and Hridesh Rajan. 2017. Statistical learning for inference between implementations and documentation. In Software Engineering: New Ideas and Emerging Technologies Results Track (ICSENIER), 2017 IEEE/ACM 39th International Conference on. IEEE, 27–30.
    [46]
    Sarah Rastkar and Gail C Murphy. 2013. Why did this code change?. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 1193– 1196.
    [47]
    Sarah Rastkar, Gail C Murphy, and Alexander WJ Bradley. 2011. Generating natural language summaries for crosscutting source code concerns. In Software Maintenance (ICSM), 2011 27th IEEE International Conference on. IEEE, 103–112.
    [48]
    Jinfeng Shen, Xiaobing Sun, Bin Li, Hui Yang, and Jiajun Hu. 2016. On Automatic Summarization of What and Why Information in Source Code Changes. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, Vol. 1. IEEE, 103–112.
    [49]
    Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, 43–52.
    [50]
    Giriprasad Sridhara, Lori Pollock, and K Vijay-Shanker. 2011. Automatically detecting and describing high level actions within methods. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 101–110.
    [51]
    Giriprasad Sridhara, Lori Pollock, and K Vijay-Shanker. 2011. Generating parameter comments and integrating with method summaries. In Program Comprehension (ICPC), 2011 IEEE 19th International Conference on. IEEE, 71–80.
    [52]
    Xiaoran Wang, Lori Pollock, and K Vijay-Shanker. 2017. Automatically generating natural language descriptions for object-related statement sequences. In Software Analysis, Evolution and Reengineering (SANER), 2017 IEEE 24th International Conference on. IEEE, 205–216.
    [53]
    Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics bulletin 1, 6 (1945), 80–83.
    [54]
    Edmund Wong, Taiyue Liu, and Lin Tan. 2015. Clocom: Mining existing source code for automatic comment generation. In Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on. IEEE, 380– 389.
    [55]
    Edmund Wong, Jinqiu Yang, and Lin Tan. 2013. Autocomment: Mining question and answer sites for automatic comment generation. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 562– 567.
    [56]
    Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E Hassan, and Zhenchang Xing. 2017. What do developers search for on the web? Empirical Software Engineering 22, 6 (2017), 3149–3185.
    [57]
    Meng Yan, Ying Fu, Xiaohong Zhang, Dan Yang, Ling Xu, and Jeffrey D Kymer. 2016. Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project. Journal of Systems and Software 113 (2016), 296–308.

    Cited By

    View all
    • (2024)Commit Message Generation via ChatGPT: How Far Are We?Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering10.1145/3650105.3652300(124-129)Online publication date: 14-Apr-2024
    • (2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
    • (2024)ESGen: Commit Message Generation Based on Edit Sequence of Code ChangeProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644414(112-124)Online publication date: 15-Apr-2024
    • Show More Cited By

    Index Terms

    1. Neural-machine-translation-based commit message generation: how far are we?

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering
      September 2018
      955 pages
      ISBN:9781450359375
      DOI:10.1145/3238147
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 September 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      • Distinguished Paper

      Author Tags

      1. Commit message generation
      2. Nearest neighbor algorithm
      3. Neural machine translation

      Qualifiers

      • Research-article

      Conference

      ASE '18
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 82 of 337 submissions, 24%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)175
      • Downloads (Last 6 weeks)16
      Reflects downloads up to 14 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Commit Message Generation via ChatGPT: How Far Are We?Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering10.1145/3650105.3652300(124-129)Online publication date: 14-Apr-2024
      • (2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
      • (2024)ESGen: Commit Message Generation Based on Edit Sequence of Code ChangeProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644414(112-124)Online publication date: 15-Apr-2024
      • (2024)Towards Summarizing Code Snippets Using Pre-Trained TransformersProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644400(1-12)Online publication date: 15-Apr-2024
      • (2024)Only diff Is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language ModelProceedings of the ACM on Software Engineering10.1145/36437601:FSE(745-766)Online publication date: 12-Jul-2024
      • (2024)KADEL: Knowledge-Aware Denoising Learning for Commit Message GenerationACM Transactions on Software Engineering and Methodology10.1145/364367533:5(1-32)Online publication date: 4-Jun-2024
      • (2024)Learning to Represent PatchesProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643521(396-397)Online publication date: 14-Apr-2024
      • (2024)Deep Is Better? An Empirical Comparison of Information Retrieval and Deep Learning Approaches to Code SummarizationACM Transactions on Software Engineering and Methodology10.1145/363197533:3(1-37)Online publication date: 15-Mar-2024
      • (2024)How Important Are Good Method Names in Neural Code Generation? A Model Robustness PerspectiveACM Transactions on Software Engineering and Methodology10.1145/363001033:3(1-35)Online publication date: 14-Mar-2024
      • (2024)Multi-Intent Inline Code Comment Generation via Large Language ModelInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450005034:06(845-868)Online publication date: 23-Mar-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media