research-article

On the Heterophily of Program Graphs: A Case Study of Graph-based Type Inference

Authors:

Xiaoxing MaAuthors Info & Claims

Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on Internetware

Pages 1 - 10

https://doi.org/10.1145/3671016.3671389

Published: 24 July 2024 Publication History

Abstract

Treating programs as graphs and employing graph learning techniques to analyze them have been widely adopted in many software engineering tasks. A recent progress in this vein is to apply graph neural networks (GNNs) to model program graphs, which is built upon the homophily assumption, i.e., similar nodes tend to connect each other. However, this assumption is not always valid in program graphs, as various edges such as AST edges and token occurrence edges may connect dissimilar nodes with quite different properties. Such phenomenon is termed as the heterophily of program graphs. In this paper, we propose a new heterophily-aware graph convolutional network (HAGCN) to better handle the heterophilic program graphs. Specifically, we first introduce the subtraction operation into the message passing mechanism of GNNs, which allows HAGCN to push apart dissimilar nodes in the representation space. Then, HAGCN separately encodes each type of edges, and uses a global relation-aware attention mechanism to fuse messages from different edge types. Moreover, we also theoretically analyze the expressive power of HAGCN from the perspective of convolution filters and contrast the differences between HAGCN and other GNNs. Finally, we take type inference as an example to evaluate the effectiveness of the proposed approach. Experimental results demonstrate that HAGCN significantly outperforms the existing non-heterophilic competitors, as well as the existing state-of-the-art graph-based type inference approaches.

References

[1]

Miltiadis Allamanis. 2019. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. 143–153.

Digital Library

[2]

Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1–37.

Digital Library

[3]

Miltiadis Allamanis, Earl T Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st acm sigplan conference on programming language design and implementation. 91–105.

Digital Library

[4]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017. Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 (2017).

[5]

Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).

[6]

Hadeel Alsolai and Marc Roper. 2020. A systematic literature review of machine learning techniques for software maintainability prediction. Information and Software Technology 119 (2020), 106214.

Digital Library

[7]

Chris Andreae, James Noble, Shane Markstrum, and Todd Millstein. 2006. A framework for implementing pluggable type systems. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications. 57–74.

Digital Library

[8]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems 87 (2020), 101374.

Digital Library

[9]

Aakash Bansal, Zachary Eberhart, Zachary Karas, Yu Huang, and Collin McMillan. 2023. Function call graph context encoding for neural source code summarization. IEEE Transactions on Software Engineering (2023).

Digital Library

[10]

Antonia Bertolino. 2007. Software testing research: Achievements, challenges, dreams. In Future of Software Engineering (FOSE’07). IEEE, 85–103.

[11]

Deyu Bo, Xiao Wang, Chuan Shi, and Huawei Shen. 2021. Beyond low-frequency information in graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 3950–3957.

[12]

Marc Brockschmidt. 2020. Gnn-film: Graph neural networks with feature-wise linear modulation. In International Conference on Machine Learning. PMLR, 1144–1152.

[13]

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).

[14]

Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 3438–3445.

[15]

De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In Proceedings of the iEEE conference on computer vision and pattern recognition. 1335–1344.

[16]

Fan RK Chung. 1997. Spectral graph theory. Vol. 92. American Mathematical Soc.

[17]

Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. 2021. Pyinfer: Deep learning semantic type inference for python variables. arXiv preprint arXiv:2106.14316 (2021).

[18]

Milan Cvitkovic, Badal Singh, and Animashree Anandkumar. 2019. Open vocabulary learning on source code with a graph-structured cache. In International Conference on Machine Learning. PMLR, 1475–1485.

[19]

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems 29 (2016).

[20]

Pär Emanuelsson and Ulf Nilsson. 2008. A comparative study of industrial static analysis tools. Electronic notes in theoretical computer science 217 (2008), 5–21.

[21]

Yimeng Guo, Zhifei Chen, Lin Chen, Wenjie Xu, Yanhui Li, Yuming Zhou, and Baowen Xu. 2024. Generating Python Type Annotations from Type Inference: How Far Are We?ACM Transactions on Software Engineering and Methodology (2024).

Digital Library

[22]

Xu He, Shu Wang, Pengbin Feng, Xinda Wang, Shiyu Sun, Qi Li, and Kun Sun. 2023. BinGo: Identifying Security Patches in Binary Code with Graph Representation Learning. arXiv preprint arXiv:2312.07921 (2023).

[23]

Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In Proceedings of the 2017 11th Joint meeting on foundations of software engineering. 763–773.

Digital Library

[24]

Vincent J Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2019. Global relational models of source code. In International conference on learning representations.

[25]

Vladimir Ivanov, Vitaly Romanov, Giancarlo Succi, 2021. Predicting Type Annotations for Python using Embeddings from Graph Neural Networks. In ICEIS (1). 548–556.

[26]

Kevin Jesse, Premkumar T Devanbu, and Toufique Ahmed. 2021. Learning type annotation: is big data enough?. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1483–1486.

Digital Library

[27]

Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International conference on learning representatons.

[28]

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph sequence neural networks. In International conference on learning representatons.

[29]

Sitao Luan, Chenqing Hua, Qincheng Lu, Jiaqi Zhu, Mingde Zhao, Shuyuan Zhang, Xiao-Wen Chang, and Doina Precup. 2022. Revisiting heterophily for graph neural networks. Advances in neural information processing systems 35 (2022), 1362–1375.

[30]

Amir M Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4py: Practical deep similarity learning-based type inference for python. In Proceedings of the 44th International Conference on Software Engineering. 2241–2252.

Digital Library

[31]

Flemming Nielson, Hanne R Nielson, and Chris Hankin. 2015. Principles of program analysis. springer.

[32]

Irene Vlassi Pandi, Earl T Barr, Andrew D Gordon, and Charles Sutton. 2020. Opttyper: Probabilistic type inference by optimising logical and natural constraints. arXiv preprint arXiv:2004.00348 (2020).

[33]

Yun Peng, Chaozheng Wang, Wenxuan Wang, Cuiyun Gao, and Michael R Lyu. 2023. Generative Type Inference for Python. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 988–999.

[34]

Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. Typewriter: Neural type prediction with search-based validation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.

Digital Library

[35]

Ingkarat Rak-Amnouykit, Daniel McCrevan, Ana Milanova, Martin Hirzel, and Julian Dolby. 2020. Python 3 types in the wild: a tale of two type systems. In Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages. 57–70.

Digital Library

[36]

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15. Springer, 593–607.

[37]

David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. 2013. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine 30, 3 (2013), 83–98.

[38]

Zixing Song and Irwin King. 2022. Hierarchical heterogeneous graph attention network for syntax-aware summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11340–11348.

[39]

Ke Sun, Yifan Zhao, Dan Hao, and Lu Zhang. 2022. Static Type Recommendation for Python. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.

Digital Library

[40]

Ze Tang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, Liguo Huang, Zhelin Zhu, and Bin Luo. 2022. Ast-trans: Code summarization with efficient tree-structured attention. In Proceedings of the 44th International Conference on Software Engineering. 150–162.

Digital Library

[41]

Daniel Tarlow, Subhodeep Moitra, Andrew Rice, Zimin Chen, Pierre-Antoine Manzagol, Charles Sutton, and Edward Aftandilian. 2020. Learning to fix build errors with graph2diff neural networks. In Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops. 19–20.

Digital Library

[42]

Ali TehraniJamsaz, Quazi Ishtiaque Mahmud, Le Chen, Nesreen K Ahmed, and Ali Jannesari. 2024. Perfograph: A numerical aware program graph representation for performance optimization and program analysis. Advances in Neural Information Processing Systems 36 (2024).

[43]

S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. 2010. Graph kernels. The Journal of Machine Learning Research 11 (2010), 1201–1242.

Digital Library

[44]

Yanlin Wang and Hui Li. 2021. Code completion by modeling flattened abstract syntax trees as graphs. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 14015–14023.

[45]

Yu Wang, Ke Wang, Fengjuan Gao, and Linzhang Wang. 2020. Learning semantic program embeddings with graph interval neural network. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–27.

Digital Library

[46]

Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, and Denys Poshyvanyk. 2020. On learning meaningful assert statements for unit test cases. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1398–1409.

Digital Library

[47]

Jiayi Wei, Greg Durrett, and Isil Dillig. 2023. Typet5: Seq2seq type inference using static analysis. arXiv preprint arXiv:2303.09564 (2023).

[48]

Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. Lambdanet: Probabilistic type inference using graph neural networks. arXiv preprint arXiv:2005.02161 (2020).

[49]

Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In International conference on machine learning. PMLR, 6861–6871.

[50]

Yanyan Yan, Yang Feng, Hongcheng Fan, and Baowen Xu. 2023. DLInfer: Deep Learning with Static Slicing for Python Type Inference. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2009–2021.

[51]

Chaoqi Yang, Ruijie Wang, Shuochao Yao, Shengzhong Liu, and Tarek Abdelzaher. 2020. Revisiting over-smoothing in deep GCNs. arXiv preprint arXiv:2003.13663 (2020).

[52]

Jianwei Zeng, Yutong He, Tao Zhang, Zhou Xu, and Qiang Han. 2023. CLG-Trans: Contrastive learning for code summarization via graph attention-based transformer. Science of Computer Programming 226 (2023), 102925.

Digital Library

[53]

Chunyong Zhang, Bin Liu, Yang Xin, and Liangwei Yao. 2023. CPVD: Cross Project Vulnerability Detection Based On Graph Attention Network And Domain Adaptation. IEEE Transactions on Software Engineering (2023).

[54]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu, Chunming Hu, and Yang Liu. 2023. Detecting condition-related bugs with control flow graph neural network. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1370–1382.

Digital Library

[55]

Kechi Zhang, Wenhan Wang, Huangzhao Zhang, Ge Li, and Zhi Jin. 2022. Learning to represent programs with heterogeneous graphs. In 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC).

Digital Library

[56]

Kunsong Zhao, Zihao Li, Jianfeng Li, He Ye, Xiapu Luo, and Ting Chen. 2023. DeepInfer: Deep Type Inference from Smart Contract Bytecode. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 745–757.

Digital Library

Index Terms

On the Heterophily of Program Graphs: A Case Study of Graph-based Type Inference

Index terms have been assigned to the content through auto-classification.

Recommendations

Equistable graphs, general partition graphs, triangle graphs, and graph products

In this paper we examine the connections between equistable graphs, general partition graphs and triangle graphs. While every general partition graph is equistable and every equistable graph is a triangle graph, not every triangle graph is equistable, ...
The clique-separator graph for chordal graphs

We present a new representation of a chordal graph called the clique-separator graph, whose nodes are the maximal cliques and minimal vertex separators of the graph. We present structural properties of the clique-separator graph and additional ...
Learning to Augment Graph Structure for both Homophily and Heterophily Graphs
Machine Learning and Knowledge Discovery in Databases: Research Track
Abstract
Recent years have witnessed great successes in performing graph structure learning for Graph Neural Networks (GNNs). However, comparatively little work studies structure augmentation for graphs, where the augmented structures are only used for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on Internetware

July 2024

518 pages

ISBN:9798400707056

DOI:10.1145/3671016

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

Internetware 2024

Sponsor:

SIGSOFT

Internetware 2024: 15th Asia-Pacific Symposium on Internetware

July 24 - 26, 2024

Macau, China

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
48
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)8

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents