skip to main content
research-article

Frankenstein: fast and lightweight call graph generation for software builds

Published: 16 November 2023 Publication History

Abstract

Call Graphs are a rich data source and form the foundation for advanced static analyses that can, for example, detect security vulnerabilities or dead code. This information is invaluable when it is immediately available, such as in the output of a build system. Call Graph generation is a whole-program analysis: not just the application, but also all its dependencies are processed together. Recent work has shown that even advanced static analyses can use summarization techniques to substantially improve runtime; however, existing analyses focus on soundness, and as such remain very expensive. When executed in the build system, which typically has limited resources, even powerful servers suffer from slow build times, rendering these analyses impractical in today’s fast-paced development. In this paper, we aim to strike a balance between improving static analyses while remaining practical for use cases that require quick results in low-resource environments. We propose a summarization-based implementation of a Class-Hierarchy Analysis algorithm for call graph generation of Java programs. Our approach leverages the fact that dependency sets often do not change between builds: we can generate call graphs for these dependencies, cache their generation for subsequent builds, and using a novel stitching algorithm, Frankenstein, merge all partial results into a complete call graph for the whole program. Our evaluation results show that this lightweight approach can substantially outperform existing frameworks. In terms of speed improvements, Frankenstein surpasses the baselines by up to 38%, requiring an average of just 388 Megabytes of memory. This makes the proposed approach practical for build systems with limited memory resources. Despite these optimizations, our generated call graphs maintain a near-identical set of edges when compared to the baselines, achieving an F1 score of up to 0.98. This summarization-based approach for call graph generation paves the way for using extended static analyses in build processes.

References

[1]
Alexandru CV, Panichella S, Proksch S, and Gall HC Redundancy-free analysis of multi-revision software artifacts Empir Softw Eng 2019 24 1 332-380
[2]
Ali K (2014) The Separate Compilation Assumption. Ph.D. thesis. University of Waterloo, Ontario, Canada. https://hdl.handle.net/10012/8835
[3]
Ali K, Lhoták O (2012) Application-Only Call Graph Construction. In: Noble J (ed) In the proceedings of the 26th European Conference on Object-Oriented Programming, ECOOP, Beijing, China. Lecture Notes in Computer Science, vol 7313. Springer, pp 688–712.
[4]
Ali K, Lhoták O (2013) Averroes: Whole-Program Analysis without the Whole Program. In: Castagna G (ed) In the proceedings of the 27th European Conference on Object-Oriented Programming, ECOOP, Montpellier, France. Lecture Notes in Computer Science, vol 7920. Springer, pp 378–400.
[5]
Arzt S, Bodden E (2016) StubDroid: automatic inference of precise data-flow summaries for the android framework. In: Dillon LK, Visser W, Williams LA (eds) In the proceedings of the 38th International Conference on Software Engineering, ICSE, Austin, TX, USA. ACM, pp 725–735.
[6]
Bacon DF, Sweeney PF (1996) Fast Static Analysis of C++ Virtual Function Calls. In: Anderson L, Coplien J (eds) In the proceedings of the 1996 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications, OOPSLA, San Jose, California, USA. ACM, pp 324–341.
[7]
Ball T, Rajamani SK (2001) Bebop: a path-sensitive interprocedural dataflow engine. In: Field J, Snelting G (eds) In the proceedings of the ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, PASTE, Snowbird, Utah, USA. ACM, pp 97–103.
[8]
Boldi P and Gousios G Fine-Grained Network Analysis for Modern Software Ecosystems ACM Trans Internet Technol 2021 21 1 1:1-1:14
[9]
Bracha G, Odersky M, Stoutamire D, Wadler P (1998) Making the Future Safe for the Past: Adding Genericity to the Java Programming Language. In: Freeman-Benson BN, Chambers C (eds) In the proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications, OOPSLA, Vancouver, British Columbia, Canada. ACM, pp 183–200.
[10]
Chord (2023) A program analysis platform for java. https://www.seas.upenn.edu/~mhnaik/chord/user_guide/index.html. Accessed 15 Jan 2022
[11]
Dean J, Grove D, Chambers C (1995) Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis. In: Olthoff WG (ed) In the poroceedings of the 9th European Conference on Object-Oriented Programming, ECOOP, Århus, Denmark, Lecture Notes in Computer Science, vol 952. Springer, pp 77–101.
[12]
Dependabot. (2023) https://github.com/dependabot. Accessed 15 Jan 2022
[13]
Dillig I, Dillig T, Aiken A, Sagiv M (2011) Precise and compact modular procedure summaries for heap manipulating programs. In: Hall MW, Padua DA (eds) Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA. ACM, pp 567–577.
[14]
Dyer R, Nguyen HA, Rajan H, and Nguyen TN Boa: Ultra-Large-Scale Software Repository and Source-Code Mining ACM Trans Softw Eng Methodol 2015 25 1 7:1-7:34
[15]
Eichberg M, Kübler F, Helm D, Reif M, Salvaneschi G, Mezini M (2018) Lattice based modularization of static analyses. In: Dolby J, Halfond WGJ, Mishra A (eds) In the companion proceedings for the ISSTA/ECOOP Workshops, Amsterdam, Netherlands. ACM, pp 113–118.
[16]
Goldberg A, Robson D (1983) Smalltalk-80: The Language and Its Implementation. Addison-Wesley
[17]
Gopan D, Reps TW (2007) Low-Level Library Analysis and Summarization. In: Damm W, Hermanns H (eds) In the proceedings of the 19th International Conference on Computer Aided Verification, CAV, Germany, Lecture Notes in Computer Science, vol 4590. Springer, pp 68–81.
[19]
Hejderup J, van Deursen A, Gousios G (2018) Software ecosystem call graph for dependency management. In: Zisman A, Apel S (eds) In the proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results, ICSE (NIER), Gothenburg, Sweden. ACM, pp 101–104.
[20]
Helm D, Kübler F, Reif M, Eichberg M, Mezini M (2020) Modular collaborative program analysis in OPAL. In: Devanbu P, Cohen MB, Zimmermann T (eds) In the proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE, Virtual Event, USA. ACM, pp 184–196.
[21]
T. j. watson libraries for analysis. (2023) http://wala.sf.net/. Accessed 15 Jan 2022
[23]
Keshani M (2021) Scalable Call Graph Constructor for Maven. In: In the companion proceedings of the 43rd IEEE/ACM International Conference on Software Engineering, ICSE Companion, Madrid, Spain. IEEE, pp 99–101.
[24]
Kula RG, Germán DM, Ouni A, Ishio T, and Inoue K Do developers update their library dependencies? - An empirical study on the impact of security advisories on library migration Empir Softw Eng 2018 23 1 384-417
[25]
Kulkarni S, Mangal R, Zhang X, Naik M (2016) Accelerating program analyses by cross-program training. In: Visser E, Smaragdakis Y (eds) In the proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA, part of SPLASH, Amsterdam, The Netherland. ACM, pp 359–377.
[26]
Lam P, Bodden E, Lhoták O, Hendren L (2011) The Soot framework for Java program analysis: a retrospective. In: Cetus Users and Compiler Infrastructure Workshop CETUS, vol 15
[27]
Landi W Undecidability of Static Analysis. LOPLAS 1992 1 4 323-337
[28]
Livshits B, Sridharan M, Smaragdakis Y, Lhoták O, Amaral JN, Chang BE, Guyer SZ, Khedker UP, Møller A, and Vardoulakis D In defense of soundiness: a manifesto Commun ACM 2015 58 2 44-46
[29]
[30]
Nielsen BB, Torp MT, Møller A (2021) Modular call graph construction for security scanning of Node.js applications. In: Cadar C, Zhang X (eds) In the Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA, Virtual Event, Denmark. ACM, pp 29–41.
[31]
Ramalingam G The Undecidability of Aliasing ACM Trans Program Lang Syst 1994 16 5 1467-1471
[32]
Reif M, Eichberg M, Hermann B, Lerch J, Mezini M (2016) Call graph construction for Java libraries. In: Zimmermann T, Cleland-Huang J, Su Z (eds) In the proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE, Seattle, WA, USA. ACM, pp 474–486.
[33]
Reif M, Kübler F, Eichberg M, Helm D, Mezini M (2019) Judge: identifying, understanding, and evaluating sources of unsoundness in call graphs. In: Zhang D, Møller A (eds) In the proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA, Beijing, China. ACM, pp 251–261.
[35]
Reps TW, Horwitz S, Sagiv S (1995) Precise Interprocedural Dataflow Analysis via Graph Reachability. In: Cytron RK, Lee P (eds) Conference Record of POPL: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Francisco, California, USA. ACM Press, pp 49–61.
[36]
Reps T Undecidability of context-sensitive data-dependence analysis ACM Trans Program Lang Syst, TOPLAS 2000 22 1 162-186
[37]
Rountev A, Kagan S, Marlowe TJ (2006) Interprocedural Dataflow Analysis in the Presence of Large Libraries. In: Mycroft A, Zeller A (eds) In the proceedings of the 15th International Conference on Compiler Construction, CC, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS, Vienna, Austria, Lecture Notes in Computer Science, vol 3923. Springer, pp 2–16.
[38]
Rountev A, Sharp M, Xu G (2008) IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries. In: Hendren LJ (ed) In the proceedings of the 17th International Conference on Compiler Construction, CC, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS, Budapest, Hungary, Lecture Notes in Computer Science, vol 4959. Springer, pp 53–68.
[39]
Schubert PD, Hermann B, Bodden E (2021) Lossless, Persisted Summarization of Static Callgraph, Points-To and Data-Flow Analysis. In: Møller A, Sridharan M (eds) In the proceedings of the 35th European Conference on Object-Oriented Programming, ECOOP, Aarhus, Denmark (Virtual Conference), LIPIcs, vol 194. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, pp 2:1–2:31.
[40]
Sharir M, Pnueli A et al (1978) Two approaches to interprocedural data flow analysis. In: New York University. Courant Institute of Mathematical Sciences
[41]
Shivers O (1988) Control-Flow Analysis in Scheme. In: Wexelblat RL (ed) In the proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, Atlanta, Georgia, USA. ACM, pp 164–174.
[42]
Shrinkwrap resolvers. (2023) https://github.com/shrinkwrap/resolver. Accessed 15 Jan 2022
[43]
Souter AL, Pollock LL (2001) Incremental Call Graph Reanalysis for Object-Oriented Software Maintenance. In: In the proceedings of the International Conference on Software Maintenance, ICSM, Florence, Italy. IEEE Computer Society, pp 682–691.
[44]
Srivastava A Unreachable Procedures in Object-Oriented Programming. LOPLAS 1992 1 4 355-364
[45]
Sui L, Dietrich J, Emery M, Rasheed S, Tahir A (2018) On the Soundness of Call Graph Construction in the Presence of Dynamic Language Features - A Benchmark and Tool Evaluation. In: Ryu S (ed) In the proceedings of the 16th Asian Symposium on Programming Languages and Systems, APLAS, Wellington, New Zealand, Lecture Notes in Computer Science, vol 11275. Springer, pp 69–88.
[46]
Sui L, Dietrich J, Tahir A, Fourtounis G (2020) On the recall of static call graph construction in practice. In: Rothermel G, Bae D (eds) In the proceedings of the 42nd International Conference on Software Engineering, ICSE, Seoul, South Korea. ACM, pp 1049–1060.
[47]
Sundaresan V, Hendren LJ, Razafimahefa C, Vallée-Rai R, Lam P, Gagnon E, Godin C (2000) Practical virtual method call resolution for Java. In: Rosson MB, Lea D (eds) In the proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications, OOPSLA, Minneapolis, Minnesota, USA. ACM, pp 264–280.
[48]
The doop project. (2023) http://doop.program-analysis.org/. Accessed 15 Jan 2022
[49]
Tip F, Palsberg J (2000) Scalable propagation-based call graph construction algorithms. In: Rosson MB, Lea D (eds) In the proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications OOPSLA, Minneapolis, Minnesota, USA. ACM, pp 281–293.
[50]
Toman J, Grossman D (2017) Taming the Static Analysis Beast. In: Lerner BS, Bodík R, Krishnamurthi S (eds) 2nd Summit on Advances in Programming Languages, SNAPL, Asilomar, CA, USA, LIPIcs, vol 71. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, pp 18:1–18:14.
[51]
Tripp O, Guarnieri S, Pistoia M, Aravkin AY (2014) ALETHEIA: Improving the Usability of Static Security Analysis. In: Ahn G, Yung M, Li N (eds) In the proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA. ACM, pp 762–774.
[52]
Utture A, Liu S, Kalhauge CG, Palsberg J (2022) Striking a Balance: Pruning False-Positives from Static Call Graphs. In: In the proceedings of the 44th IEEE/ACM International Conference on Software Engineering, ICSE, Pittsburgh, PA, USA. ACM, pp 2043–2055.
[53]
Vasilescu B, Yu Y, Wang H, Devanbu PT, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in GitHub. In: Nitto ED, Harman M, Heymans P (eds) In the proceedings of the 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE, Bergamo, Italy. ACM, pp 805–816.
[54]
Whaley J, Rinard MC (1999) Compositional Pointer and Escape Analysis for Java Programs. In: Hailpern B, Northrop LM, Berman AM (eds) In the proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems. Languages & Applications, OOPSLA, Denver, Colorado, USA. ACM, pp 187–206.

Cited By

View all
  • (2024)AROMA: Automatic Reproduction of Maven ArtifactsProceedings of the ACM on Software Engineering10.1145/36437641:FSE(836-858)Online publication date: 12-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 29, Issue 1
Jan 2024
1448 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 16 November 2023
Accepted: 30 August 2023

Author Tags

  1. Software engineering in practice
  2. Call graph generation
  3. Build systems
  4. Software ecosystems

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)AROMA: Automatic Reproduction of Maven ArtifactsProceedings of the ACM on Software Engineering10.1145/36437641:FSE(836-858)Online publication date: 12-Jul-2024

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media