research-article

Neural-machine-translation-based commit message generation: how far are we?

Authors:

Xin Xia,

Xinyu WangAuthors Info & Claims

ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Pages 373 - 384

https://doi.org/10.1145/3238147.3238190

Published: 03 September 2018 Publication History

Get Access

Abstract

Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test <pre>diffs</pre> from which NMT can generate high-quality messages are similar to one or more training <pre>diffs</pre> at the token level. (2) About 16% of the commit messages in Jiang et al.’s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers.

References

[1]

2017. Jiang et al.’s website. https://sjiang1.github.io/commitgen/. 2018. Git. https://git-scm.com/. 2018. Our online appendix. https://goo.gl/63B976. 2018. ExoPlayer. https://github.com/google/ExoPlayer. 2018. Google Closure Compiler. https://github.com/google/closure-compiler. 2018. Liferay Portal. https://github.com/liferay/liferay-portal. 2018. Stack Overflow. https://stackoverflow.com/. 2018. TsExtractor in ExoPlayer. https://goo.gl/Dsbdjf.

Abstract

References

Cited By

Index Terms

Recommendations

On the Relevance of Cross-project Learning with Nearest Neighbours for Commit Message Generation

Using Translation Memory to Improve Neural Machine Translations

Post-editing neural machine translation versus phrase-based machine translation for English---Chinese

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations