skip to main content
10.1007/978-3-030-88806-0_16guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Multilanguage Static Analysis of Python Programs with Native C Extensions

Published: 17 October 2021 Publication History

Abstract

Modern programs are increasingly multilanguage, to benefit from each programming language’s advantages and to reuse libraries. For example, developers may want to combine high-level Python code with low-level, performance-oriented C code. In fact, one in five of the 200 most downloaded Python libraries available on GitHub contains C code. Static analyzers tend to focus on a single language and may use stubs to model the behavior of foreign function calls. However, stubs are costly to implement and undermine the soundness of analyzers. In this work, we design a static analyzer by abstract interpretation that can handle Python programs calling C extensions. It analyses directly and fully automatically both the Python and the C source codes. It reports runtime errors that may happen in Python, in C, and at the interface. We implemented our analysis in a modular fashion: it reuses off-the-shelf C and Python analyses written in the same analyzer. This approach allows sharing between abstract domains of different languages. Our analyzer can tackle tests of real-world libraries a few thousand lines of C and Python long in a few minutes.

References

[1]
Allen, N., Krishnan, P., Scholz, B.: Combining type-analysis with points-to analysis for analyzing Java library source-code. In: SOAP@PLDI. ACM (2015)
[2]
Balakrishnan G and Reps T Yi K Recency-abstraction for heap-allocated storage Static Analysis 2006 Heidelberg Springer 221-239
[3]
Blanchet B et al. Mogensen TÆ, Schmidt DA, Sudborough IH, et al. Design and implementation of a special-purpose static program analyzer for safety-critical real-time embedded software The Essence of Computation 2002 Heidelberg Springer 85-108
[4]
Brown, F., Narayan, S., Wahby, R.S., Engler, D.R., Jhala, R., Stefan, D.: Finding and preventing bugs in JavaScript bindings. In: SP. IEEE Computer Society (2017).
[5]
Bucur, S., Kinder, J., Candea, G.: Prototyping symbolic execution engines for interpreted languages. In: ASPLOS, pp. 239–254. ACM (2014)
[6]
Buro S, Crole RL, and Mastroeni I Pichardie D and Sighireanu M On multi-language abstraction Static Analysis 2020 Cham Springer 310-332
[7]
Buro S and Mastroeni I Caires L On the multi-language construction Programming Languages and Systems 2019 Cham Springer 293-321
[8]
Chipounov, V., Kuznetsov, V., Candea, G.: S2E: a platform for in-vivo multi-path analysis of software systems. In: ASPLOS, pp. 265–278. ACM (2011)
[9]
Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: POPL. ACM (1977)
[10]
Duncan, C.: Native-code and shader implementations of Perlin noise for Python (2021). https://github.com/caseman/noise. Accessed April 2021
[11]
Fromherz A, Ouadjaout A, and Miné A Dutle A, Muñoz C, and Narkawicz A Static value analysis of python programs by abstract interpretation NASA Formal Methods 2018 Cham Springer 185-202
[12]
Furr, M., Foster, J.S.: Checking type safety of foreign function calls. In: PLDI. ACM (2005).
[13]
Furr M and Foster JS Sestoft P Polymorphic type inference for the JNI Programming Languages and Systems 2006 Heidelberg Springer 309-324
[14]
Furr M and Foster JS Checking type safety of foreign function calls ACM Trans. Program. Lang. Syst. 2008 30 4 1-63
[15]
Haapala, A., Määttä, E., Jonatas, C.D., Ohtamaa, M., Necas, D.: Levenshtein Python C extension module (2021). https://github.com/ztane/python-Levenshtein/. Accessed April 2021
[16]
Hu, M., Zhang, Y.: The Python/C API: evolution, usage statistics, and bug patterns. In: SANER. IEEE (2020).
[17]
Jakubek, A., Gałczyński, R.: Linked lists for CPython (2021). https://github.com/ajakubek/python-llist. Accessed April 2021
[18]
Journault M, Miné A, Monat R, and Ouadjaout A Chakraborty S and Navas JA Combinations of reusable abstract domains for a multilingual static analyzer Verified Software. Theories, Tools, and Experiments 2020 Cham Springer 1-18
[19]
Journault, M., Miné, A., Monat, R., Ouadjaout, A.: MOPSA: modular open platform for static analysis (2021). https://gitlab.com/mopsa/mopsa-analyzer. Accessed April 2021
[20]
Kondoh, G., Onodera, T.: Finding bugs in Java native interface programs. In: ISSTA. ACM (2008).
[21]
Kristensen, E.K., Møller, A.: Reasonably-most-general clients for JavaScript library analysis. In: ICSE. IEEE/ACM (2019).
[22]
Lee, S., Lee, H., Ryu, S.: Broadening horizons of multilingual static analysis: semantic summary extraction from C code for JNI program analysis. In: ASE. IEEE (2020).
[23]
Li, S., Tan, G.: Finding bugs in exceptional situations of JNI programs. In: CCS. ACM (2009).
[24]
Li, S., Tan, G.: JET: exception checking in the Java native interface. In: SPLASH. ACM (2011).
[25]
Li S and Tan G Exception analysis in the Java native interface Sci. Comput. Program. 2014 89 273-297
[26]
Li S and Tan G Jones R Finding reference-counting errors in Python/C programs with affine analysis ECOOP 2014 – Object-Oriented Programming 2014 Heidelberg Springer 80-104
[27]
Malcolm, D.: A static analysis tool for CPython extension code (2018). https://gcc-python-plugin.readthedocs.io/en/latest/cpychecker.html. Accessed April 2021
[28]
Mao, J., Chen, Y., Xiao, Q., Shi, Y.: RID: finding reference count bugs with inconsistent path pair checking. In: ASPLOS. ACM (2016).
[29]
Matthews J and Findler RB Operational semantics for multi-language programs ACM Trans. Program. Lang. Syst. 2009 31 3 1-44
[30]
Meyer, M.: Distance library (2021). https://github.com/doukremt/distance. Accessed April 2021
[31]
Miné, A.: Field-sensitive value analysis of embedded C programs with union types and pointer arithmetics. In: LCTES. ACM (2006)
[32]
Miné A The octagon abstract domain High. Order Symb. Comput. 2006 19 1 31-100
[33]
Monat, R., Ouadjaout, A., Miné, A.: Static type analysis by abstract interpretation of Python programs. In: ECOOP, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020).
[34]
Monat, R., Ouadjaout, A., Miné, A.: Value and allocation sensitivity in static Python analyses. In: SOAP@PLDI. ACM (2020).
[35]
Monat, R., Ouadjaout, A., Miné, A.: A multi-language static analysis of Python programs with native C extensions, July 2021.
[36]
Muła, W., Ombredanne, P.: Pyahocorasick library (2021). https://github.com/WojciechMula/pyahocorasick. Accessed April 2021
[37]
Ouadjaout A and Miné A Pichardie D and Sighireanu M A library modeling language for the static analysis of C programs Static Analysis 2020 Cham Springer 223-247
[38]
Rinetzky N, Poetzsch-Heffter A, Ramalingam G, Sagiv M, and Yahav E De Nicola R Modular shape analysis for dynamically encapsulated programs Programming Languages and Systems 2007 Heidelberg Springer 220-236
[39]
van Rossum, G., Lehtosalo, J., Langa, Ł.: Python Enhancement Proposal 484 (2021). https://www.python.org/dev/peps/pep-0484/. Accessed 03 Mar 2021
[40]
van Rossum, G.: The Python development team: Python/C API reference manual (2021). https://docs.python.org/3.8/c-api/index.html. Accessed April 2021
[41]
Schnell, I.: Bitarray library (2021). https://github.com/ilanschnell/bitarray. Accessed April 2021
[42]
Tan, G., Croft, J.: An empirical security study of the native code in the JDK. In: USENIX. USENIX Association (2008)
[43]
Tan, G., Morrisett, G.: ILEA: inter-language analysis across Java and C. In: OOPSLA. ACM (2007).
[44]
Typeshed contributors: Typeshed (2021). https://github.com/python/typeshed/. Accessed April 2021
[45]
Wei, F., Lin, X., Ou, X., Chen, T., Zhang, X.: JN-SAF: precise and efficient NDK/JNI-aware inter-language static analysis framework for security vetting of Android applications with native code. In: SIGSAC. ACM (2018).

Cited By

View all
  • (2024)AXA: Cross-Language Analysis through Integration of Single-Language AnalysesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3696193(1195-1205)Online publication date: 27-Oct-2024
  • (2024)C-2PO: A Weakly Relational Pointer Domain: “These Are Not the Memory Cells You Are Looking For”Proceedings of the 10th ACM SIGPLAN International Workshop on Numerical and Symbolic Abstract Domains10.1145/3689609.3689994(2-9)Online publication date: 17-Oct-2024
  • (2024)On Polyglot Program TestingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663787(507-511)Online publication date: 10-Jul-2024
  • Show More Cited By

Index Terms

  1. A Multilanguage Static Analysis of Python Programs with Native C Extensions
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          Static Analysis: 28th International Symposium, SAS 2021, Chicago, IL, USA, October 17–19, 2021, Proceedings
          Oct 2021
          493 pages
          ISBN:978-3-030-88805-3
          DOI:10.1007/978-3-030-88806-0

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 17 October 2021

          Author Tags

          1. Formal methods
          2. Static analysis
          3. Abstract interpretation
          4. Dynamic programming language
          5. Multilanguage analysis

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 01 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)AXA: Cross-Language Analysis through Integration of Single-Language AnalysesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3696193(1195-1205)Online publication date: 27-Oct-2024
          • (2024)C-2PO: A Weakly Relational Pointer Domain: “These Are Not the Memory Cells You Are Looking For”Proceedings of the 10th ACM SIGPLAN International Workshop on Numerical and Symbolic Abstract Domains10.1145/3689609.3689994(2-9)Online publication date: 17-Oct-2024
          • (2024)On Polyglot Program TestingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663787(507-511)Online publication date: 10-Jul-2024
          • (2024)Dr Wenowdis: Specializing Dynamic Language C Extensions using Type InformationProceedings of the 13th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3652588.3663316(1-8)Online publication date: 20-Jun-2024
          • (2024)Large Language Models Can Connect the Dots: Exploring Model Optimization Bugs with Domain Knowledge-Aware PromptsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680383(1579-1591)Online publication date: 11-Sep-2024
          • (2024)Cross-Language Taint Analysis: Generating Caller-Sensitive Native Code Specification for JavaIEEE Transactions on Software Engineering10.1109/TSE.2024.339225450:6(1518-1533)Online publication date: 27-May-2024
          • (2023)Speeding up Static Analysis with the Split OperatorProceedings of the 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3589250.3596141(14-19)Online publication date: 6-Jun-2023
          • (2023)Unconstrained Variable Oracles for Faster Numeric Static AnalysesStatic Analysis10.1007/978-3-031-44245-2_5(65-83)Online publication date: 22-Oct-2023
          • (2022)On the vulnerability proneness of multilingual codeProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549173(847-859)Online publication date: 7-Nov-2022

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media