skip to main content
10.1145/3671016.3671389acmconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

On the Heterophily of Program Graphs: A Case Study of Graph-based Type Inference

Published: 24 July 2024 Publication History

Abstract

Treating programs as graphs and employing graph learning techniques to analyze them have been widely adopted in many software engineering tasks. A recent progress in this vein is to apply graph neural networks (GNNs) to model program graphs, which is built upon the homophily assumption, i.e., similar nodes tend to connect each other. However, this assumption is not always valid in program graphs, as various edges such as AST edges and token occurrence edges may connect dissimilar nodes with quite different properties. Such phenomenon is termed as the heterophily of program graphs. In this paper, we propose a new heterophily-aware graph convolutional network (HAGCN) to better handle the heterophilic program graphs. Specifically, we first introduce the subtraction operation into the message passing mechanism of GNNs, which allows HAGCN to push apart dissimilar nodes in the representation space. Then, HAGCN separately encodes each type of edges, and uses a global relation-aware attention mechanism to fuse messages from different edge types. Moreover, we also theoretically analyze the expressive power of HAGCN from the perspective of convolution filters and contrast the differences between HAGCN and other GNNs. Finally, we take type inference as an example to evaluate the effectiveness of the proposed approach. Experimental results demonstrate that HAGCN significantly outperforms the existing non-heterophilic competitors, as well as the existing state-of-the-art graph-based type inference approaches.

References

[1]
Miltiadis Allamanis. 2019. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. 143–153.
[2]
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1–37.
[3]
Miltiadis Allamanis, Earl T Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st acm sigplan conference on programming language design and implementation. 91–105.
[4]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017. Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 (2017).
[5]
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).
[6]
Hadeel Alsolai and Marc Roper. 2020. A systematic literature review of machine learning techniques for software maintainability prediction. Information and Software Technology 119 (2020), 106214.
[7]
Chris Andreae, James Noble, Shane Markstrum, and Todd Millstein. 2006. A framework for implementing pluggable type systems. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications. 57–74.
[8]
Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems 87 (2020), 101374.
[9]
Aakash Bansal, Zachary Eberhart, Zachary Karas, Yu Huang, and Collin McMillan. 2023. Function call graph context encoding for neural source code summarization. IEEE Transactions on Software Engineering (2023).
[10]
Antonia Bertolino. 2007. Software testing research: Achievements, challenges, dreams. In Future of Software Engineering (FOSE’07). IEEE, 85–103.
[11]
Deyu Bo, Xiao Wang, Chuan Shi, and Huawei Shen. 2021. Beyond low-frequency information in graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 3950–3957.
[12]
Marc Brockschmidt. 2020. Gnn-film: Graph neural networks with feature-wise linear modulation. In International Conference on Machine Learning. PMLR, 1144–1152.
[13]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).
[14]
Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 3438–3445.
[15]
De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In Proceedings of the iEEE conference on computer vision and pattern recognition. 1335–1344.
[16]
Fan RK Chung. 1997. Spectral graph theory. Vol. 92. American Mathematical Soc.
[17]
Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. 2021. Pyinfer: Deep learning semantic type inference for python variables. arXiv preprint arXiv:2106.14316 (2021).
[18]
Milan Cvitkovic, Badal Singh, and Animashree Anandkumar. 2019. Open vocabulary learning on source code with a graph-structured cache. In International Conference on Machine Learning. PMLR, 1475–1485.
[19]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems 29 (2016).
[20]
Pär Emanuelsson and Ulf Nilsson. 2008. A comparative study of industrial static analysis tools. Electronic notes in theoretical computer science 217 (2008), 5–21.
[21]
Yimeng Guo, Zhifei Chen, Lin Chen, Wenjie Xu, Yanhui Li, Yuming Zhou, and Baowen Xu. 2024. Generating Python Type Annotations from Type Inference: How Far Are We?ACM Transactions on Software Engineering and Methodology (2024).
[22]
Xu He, Shu Wang, Pengbin Feng, Xinda Wang, Shiyu Sun, Qi Li, and Kun Sun. 2023. BinGo: Identifying Security Patches in Binary Code with Graph Representation Learning. arXiv preprint arXiv:2312.07921 (2023).
[23]
Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In Proceedings of the 2017 11th Joint meeting on foundations of software engineering. 763–773.
[24]
Vincent J Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2019. Global relational models of source code. In International conference on learning representations.
[25]
Vladimir Ivanov, Vitaly Romanov, Giancarlo Succi, 2021. Predicting Type Annotations for Python using Embeddings from Graph Neural Networks. In ICEIS (1). 548–556.
[26]
Kevin Jesse, Premkumar T Devanbu, and Toufique Ahmed. 2021. Learning type annotation: is big data enough?. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1483–1486.
[27]
Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International conference on learning representatons.
[28]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph sequence neural networks. In International conference on learning representatons.
[29]
Sitao Luan, Chenqing Hua, Qincheng Lu, Jiaqi Zhu, Mingde Zhao, Shuyuan Zhang, Xiao-Wen Chang, and Doina Precup. 2022. Revisiting heterophily for graph neural networks. Advances in neural information processing systems 35 (2022), 1362–1375.
[30]
Amir M Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4py: Practical deep similarity learning-based type inference for python. In Proceedings of the 44th International Conference on Software Engineering. 2241–2252.
[31]
Flemming Nielson, Hanne R Nielson, and Chris Hankin. 2015. Principles of program analysis. springer.
[32]
Irene Vlassi Pandi, Earl T Barr, Andrew D Gordon, and Charles Sutton. 2020. Opttyper: Probabilistic type inference by optimising logical and natural constraints. arXiv preprint arXiv:2004.00348 (2020).
[33]
Yun Peng, Chaozheng Wang, Wenxuan Wang, Cuiyun Gao, and Michael R Lyu. 2023. Generative Type Inference for Python. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 988–999.
[34]
Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. Typewriter: Neural type prediction with search-based validation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.
[35]
Ingkarat Rak-Amnouykit, Daniel McCrevan, Ana Milanova, Martin Hirzel, and Julian Dolby. 2020. Python 3 types in the wild: a tale of two type systems. In Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages. 57–70.
[36]
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15. Springer, 593–607.
[37]
David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. 2013. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine 30, 3 (2013), 83–98.
[38]
Zixing Song and Irwin King. 2022. Hierarchical heterogeneous graph attention network for syntax-aware summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11340–11348.
[39]
Ke Sun, Yifan Zhao, Dan Hao, and Lu Zhang. 2022. Static Type Recommendation for Python. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.
[40]
Ze Tang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, Liguo Huang, Zhelin Zhu, and Bin Luo. 2022. Ast-trans: Code summarization with efficient tree-structured attention. In Proceedings of the 44th International Conference on Software Engineering. 150–162.
[41]
Daniel Tarlow, Subhodeep Moitra, Andrew Rice, Zimin Chen, Pierre-Antoine Manzagol, Charles Sutton, and Edward Aftandilian. 2020. Learning to fix build errors with graph2diff neural networks. In Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops. 19–20.
[42]
Ali TehraniJamsaz, Quazi Ishtiaque Mahmud, Le Chen, Nesreen K Ahmed, and Ali Jannesari. 2024. Perfograph: A numerical aware program graph representation for performance optimization and program analysis. Advances in Neural Information Processing Systems 36 (2024).
[43]
S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. 2010. Graph kernels. The Journal of Machine Learning Research 11 (2010), 1201–1242.
[44]
Yanlin Wang and Hui Li. 2021. Code completion by modeling flattened abstract syntax trees as graphs. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 14015–14023.
[45]
Yu Wang, Ke Wang, Fengjuan Gao, and Linzhang Wang. 2020. Learning semantic program embeddings with graph interval neural network. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–27.
[46]
Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, and Denys Poshyvanyk. 2020. On learning meaningful assert statements for unit test cases. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1398–1409.
[47]
Jiayi Wei, Greg Durrett, and Isil Dillig. 2023. Typet5: Seq2seq type inference using static analysis. arXiv preprint arXiv:2303.09564 (2023).
[48]
Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. Lambdanet: Probabilistic type inference using graph neural networks. arXiv preprint arXiv:2005.02161 (2020).
[49]
Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In International conference on machine learning. PMLR, 6861–6871.
[50]
Yanyan Yan, Yang Feng, Hongcheng Fan, and Baowen Xu. 2023. DLInfer: Deep Learning with Static Slicing for Python Type Inference. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2009–2021.
[51]
Chaoqi Yang, Ruijie Wang, Shuochao Yao, Shengzhong Liu, and Tarek Abdelzaher. 2020. Revisiting over-smoothing in deep GCNs. arXiv preprint arXiv:2003.13663 (2020).
[52]
Jianwei Zeng, Yutong He, Tao Zhang, Zhou Xu, and Qiang Han. 2023. CLG-Trans: Contrastive learning for code summarization via graph attention-based transformer. Science of Computer Programming 226 (2023), 102925.
[53]
Chunyong Zhang, Bin Liu, Yang Xin, and Liangwei Yao. 2023. CPVD: Cross Project Vulnerability Detection Based On Graph Attention Network And Domain Adaptation. IEEE Transactions on Software Engineering (2023).
[54]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu, Chunming Hu, and Yang Liu. 2023. Detecting condition-related bugs with control flow graph neural network. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1370–1382.
[55]
Kechi Zhang, Wenhan Wang, Huangzhao Zhang, Ge Li, and Zhi Jin. 2022. Learning to represent programs with heterogeneous graphs. In 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC).
[56]
Kunsong Zhao, Zihao Li, Jianfeng Li, He Ye, Xiapu Luo, and Ting Chen. 2023. DeepInfer: Deep Type Inference from Smart Contract Bytecode. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 745–757.

Index Terms

  1. On the Heterophily of Program Graphs: A Case Study of Graph-based Type Inference
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on Internetware
        July 2024
        518 pages
        ISBN:9798400707056
        DOI:10.1145/3671016
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 24 July 2024

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Graph Neural Networks
        2. Heterophilic Graph Learning
        3. Program Analysis
        4. Type Inference

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        Internetware 2024
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 55 of 111 submissions, 50%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 48
          Total Downloads
        • Downloads (Last 12 months)48
        • Downloads (Last 6 weeks)8
        Reflects downloads up to 24 Oct 2024

        Other Metrics

        Citations

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media