skip to main content
10.1145/3510003.3510069acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

FIRA: <u>fi</u>ne-grained g<u>ra</u>ph-based code change representation for automated commit message generation

Published: 05 July 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Commit messages summarize code changes of each commit in natural language, which help developers understand code changes without digging into detailed implementations and play an essential role in comprehending software evolution. To alleviate human efforts in writing commit messages, researchers have proposed various automated techniques to generate commit messages, including template-based, information retrieval-based, and learning-based techniques. Although promising, previous techniques have limited effectiveness due to their coarse-grained code change representations.
    This work proposes a novel commit message generation technique, FIRA, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically. Different from previous techniques, FIRA represents the code changes with fine-grained graphs, which explicitly describe the code edit operations between the old version and the new version, and code tokens at different granularities (i.e., sub-tokens and integral tokens). Based on the graph-based representation, FIRA generates commit messages by a generation model, which includes a graph-neural-network-based encoder and a transformer-based decoder. To make both sub-tokens and integral tokens as available ingredients for commit message generation, the decoder is further incorporated with a novel dual copy mechanism. We further perform an extensive study to evaluate the effectiveness of FIRA. Our quantitative results show that FIRA outperforms state-of-the-art techniques in terms of BLEU, ROUGE-L, and METEOR; and our ablation analysis further shows that major components in our technique both positively contribute to the effectiveness of FIRA. In addition, we further perform a human study to evaluate the quality of generated commit messages from the perspective of developers, and the results consistently show the effectiveness of FIRA over the compared techniques.

    References

    [1]
    Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. arXiv preprint arXiv:2005.00653 (2020).
    [2]
    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
    [3]
    Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65--72.
    [4]
    Raymond PL Buse and Westley R Weimer. 2010. Automatically documenting program changes. In Proceedings of the IEEE/ACM international conference on Automated software engineering. 33--42.
    [5]
    Eduardo C Campos and Marcelo de A Maia. 2019. Discovering common bug-fix patterns: A large-scale observational study. Journal of Software: Evolution and Process 31, 7 (2019), e2173.
    [6]
    Luis Fernando Cortés-Coy, Mario Linares-Vásquez, Jairo Aponte, and Denys Poshyvanyk. 2014. On automatically generating commit messages via summarization of source code changes. In 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation. IEEE, 275--284.
    [7]
    Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. arXiv preprint arXiv:1606.09375 (2016).
    [8]
    Natalia Dragan, Michael L Collard, Maen Hammad, and Jonathan I Maletic. 2011. Using stereotypes to help characterize commits. In 2011 27th IEEE International Conference on Software Maintenance (ICSM). IEEE, 520--523.
    [9]
    Natalia Dragan, Michael L Collard, and Jonathan I Maletic. 2006. Reverse engineering method stereotypes. In 2006 22nd IEEE International Conference on Software Maintenance. IEEE, 24--34.
    [10]
    Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 422--431.
    [11]
    Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering. 313--324.
    [12]
    Marco Gori, Gabriele Monfardini, and Franco Scarselli. 2005. A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2. IEEE, 729--734.
    [13]
    William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035.
    [14]
    Quinn Hanam, Fernando S de M Brito, and Ali Mesbah. 2016. Discovering bug patterns in JavaScript. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 144--156.
    [15]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
    [16]
    Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. CC2Vec: Distributed representations of code changes. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 518--529.
    [17]
    Yuan Huang, Nan Jia, Hao-Jie Zhou, Xiang-Ping Chen, Zi-Bin Zheng, and Ming-Dong Tang. 2020. Learning Human-Written Commit Messages to Document Code Changes. Journal of Computer Science and Technology 35, 6 (2020), 1258--1277.
    [18]
    Md Rakibul Islam and Minhaz F Zibran. 2020. How bugs are fixed: exposing bug-fix patterns with edits and nesting levels. In Proceedings of the 35th Annual ACM Symposium on Applied Computing. 1523--1531.
    [19]
    Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 135--146.
    [20]
    Siyuan Jiang and Collin McMillan. 2017. Towards automatic generation of short summaries of commits. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE, 320--323.
    [21]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [22]
    Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
    [23]
    Russell V Lenth. 2001. Some practical guidelines for effective sample size determination. The American Statistician 55, 3 (2001), 187--193.
    [24]
    Shanshan Li, Xu Niu, Zhouyang Jia, Ji Wang, Haochen He, and Teng Wang. 2018. Logtracker: Learning log revision behaviors proactively from software evolution history. In Proceedings of the 26th Conference on Program Comprehension. 178--188.
    [25]
    Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04). 605--612.
    [26]
    Kui Liu, Dongsun Kim, Anil Koyuncu, Li Li, Tegawendé F Bissyandé, and Yves Le Traon. 2018. A closer look at real-world patches. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 275--286.
    [27]
    Qin Liu, Zihe Liu, Hongming Zhu, Hongfei Fan, Bowen Du, and Yu Qian. 2019. Generating commit messages from diffs using pointer-generator network. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 299--309.
    [28]
    Shangqing Liu, Cuiyun Gao, Sen Chen, Nie Lun Yiu, and Yang Liu. 2020. ATOM: Commit message generation based on abstract syntax tree and hybrid ranking. IEEE Transactions on Software Engineering (2020).
    [29]
    Zhongxin Liu, Xin Xia, Ahmed E Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. 2018. Neural-machine-translation-based commit message generation: how far are we?. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 373--384.
    [30]
    Yiling Lou, Qihao Zhu, Jinhao Dong, Xia Li, Zeyu Sun, Dan Hao, Lu Zhang, and Lingming Zhang. 2021. Boosting coverage-based fault localization via graph-based representation learning. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 664--676.
    [31]
    Zhen Ni, Bin Li, Xiaobing Sun, Tianhao Chen, Ben Tang, and Xinchen Shi. 2020. Analyzing bug fix for automatic bug cause classification. Journal of Systems and Software 163 (2020), 110538.
    [32]
    Lun Yiu Nie, Cuiyun Gao, Zhicong Zhong, Wai Lam, Yang Liu, and Zenglin Xu. 2021. CoreGen: Contextualized Code Representation Learning for Commit Message Generation. Neurocomputing 459 (2021), 97--107.
    [33]
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311--318.
    [34]
    Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE transactions on neural networks 20, 1 (2008), 61--80.
    [35]
    Jinfeng Shen, Xiaobing Sun, Bin Li, Hui Yang, and Jiajun Hu. 2016. On automatic summarization of what and why information in source code changes. In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Vol. 1. IEEE, 103--112.
    [36]
    Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958.
    [37]
    Zeyu Sun, Qihao Zhu, Yingfei Xiong, Yican Sun, Lili Mou, and Lu Zhang. 2020. Treegen: A tree-based transformer architecture for code generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8984--8991.
    [38]
    Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Hongyu Zhang, Dongmei Zhang, and Wenqiang Zhang. 2021. On the Evaluation of Commit Message Generation Models: An Experimental Study. arXiv preprint arXiv:2107.05373 (2021).
    [39]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).
    [40]
    Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
    [41]
    Haoye Wang, Xin Xia, David Lo, Qiang He, Xinyu Wang, and John Grundy. 2021. Context-aware Retrieval-based Deep Commit Message Generation. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 4 (2021), 1--30.
    [42]
    Frank Wilcoxon. 1992. Individual comparisons by ranking methods. In Breakthroughs in statistics. Springer, 196--202.
    [43]
    Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems (2020).
    [44]
    Shengbin Xu, Yuan Yao, Feng Xu, Tianxiao Gu, Hanghang Tong, and Jian Lu. 2019. Commit message generation for source code changes. In IJCAI.
    [45]
    Pengcheng Yin, Graham Neubig, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L Gaunt. 2018. Learning to represent edits. arXiv preprint arXiv:1810.13337 (2018).
    [46]
    Qihao Zhu, Zeyu Sun, Yuan-an Xiao, Wenjie Zhang, Kang Yuan, Yingfei Xiong, and Lu Zhang. 2021. A syntax-guided edit decoder for neural program repair. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 341--353.

    Cited By

    View all
    • (2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/3674731Online publication date: 1-Jul-2024
    • (2024)CCAF: Learning Code Change via AdapterFusionProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671399(219-228)Online publication date: 24-Jul-2024
    • (2024)Commit Message Generation via ChatGPT: How Far Are We?Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering10.1145/3650105.3652300(124-129)Online publication date: 14-Apr-2024
    • Show More Cited By

    Index Terms

    1. FIRA: <u>fi</u>ne-grained g<u>ra</u>ph-based code change representation for automated commit message generation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICSE '22: Proceedings of the 44th International Conference on Software Engineering
      May 2022
      2508 pages
      ISBN:9781450392211
      DOI:10.1145/3510003
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      • IEEE CS

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 July 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. code change representation
      2. commit message generation
      3. graph neural network

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China

      Conference

      ICSE '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 276 of 1,856 submissions, 15%

      Upcoming Conference

      ICSE 2025

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)196
      • Downloads (Last 6 weeks)15
      Reflects downloads up to 14 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/3674731Online publication date: 1-Jul-2024
      • (2024)CCAF: Learning Code Change via AdapterFusionProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671399(219-228)Online publication date: 24-Jul-2024
      • (2024)Commit Message Generation via ChatGPT: How Far Are We?Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering10.1145/3650105.3652300(124-129)Online publication date: 14-Apr-2024
      • (2024)Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644918(571-583)Online publication date: 15-Apr-2024
      • (2024)ESGen: Commit Message Generation Based on Edit Sequence of Code ChangeProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644414(112-124)Online publication date: 15-Apr-2024
      • (2024)Only diff Is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language ModelProceedings of the ACM on Software Engineering10.1145/36437601:FSE(745-766)Online publication date: 12-Jul-2024
      • (2024)CodeArt: Better Code Models by Attention Regularization When Symbols Are LackingProceedings of the ACM on Software Engineering10.1145/36437521:FSE(562-585)Online publication date: 12-Jul-2024
      • (2024)KADEL: Knowledge-Aware Denoising Learning for Commit Message GenerationACM Transactions on Software Engineering and Methodology10.1145/364367533:5(1-32)Online publication date: 4-Jun-2024
      • (2024)Learning to Represent PatchesProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643521(396-397)Online publication date: 14-Apr-2024
      • (2024)Fine-SE: Integrating Semantic Features and Expert Features for Software Effort EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623349(1-12)Online publication date: 20-May-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media