skip to main content
10.1109/ICSE43902.2021.00145acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

PyART: Python API Recommendation in Real-Time

Published: 05 November 2021 Publication History

Abstract

API recommendation in real-time is challenging for dynamic languages like Python. Many existing API recommendation techniques are highly effective, but they mainly support static languages. A few Python IDEs provide API recommendation functionalities based on type inference and training on a large corpus of Python libraries and third-party libraries. As such, they may fail to recommend or make poor recommendations when type information is missing or target APIs are project-specific. In this paper, we propose a novel approach, PyART, to recommending APIs for Python programs in real-time. It features a light-weight analysis to derive so-called optimistic data-flow, which is neither sound nor complete, but simulates the local data-flow information humans can derive. It extracts three kinds of features: data-flow, token similarity, and token co-occurrence, in the context of the program point where a recommendation is solicited. A predictive model is trained on these features using the Random Forest algorithm. Evaluation on 8 popular Python projects demonstrates that PyART can provide effective API recommendations. When historic commits can be leveraged, which is the target scenario of a state-of-the-art tool ARIREC, our average top-1 accuracy is over 50% and average top-10 accuracy over 70%, outperforming APIREC and Intellicode (i.e., the recommendation component in Visual Studio) by 28.48%-39.05% for top-1 accuracy and 24.41%-30.49% for top-10 accuracy. In other applications such as when historic comments are not available and cross-project recommendation, PyART also shows better overall performance. The time to make a recommendation is less than a second on average, satisfying the real-time requirement.

References

[1]
Q. Huang, X. Xia, Z. Xing, D. Lo, and X. Wang, "Api method recommendation without worrying about the task-api knowledge gap," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 2018, pp. 293--304.
[2]
W. Xiong, Z. Lu, B. Li, B. Hang, and Z. Wu, "Automating smart recommendation from natural language api descriptions via representation learning," Future Generation Computer Systems, vol. 87, pp. 382--391, 2018.
[3]
M. M. Rahman, C. K. Roy, and D. Lo, "Rack: Automatic api recommendation using crowdsourced knowledge," in 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1. IEEE, 2016, pp. 349--359.
[4]
X. Sun, C. Xu, B. Li, Y. Duan, and X. Lu, "Enabling feature location for api method recommendation and usage location," IEEE Access, vol. 7, pp. 49 872-49 881, 2019.
[5]
L. Qi, Q. He, F. Chen, W. Dou, S. Wan, X. Zhang, and X. Xu, "Finding all you need: Web apis recommendation in web of things through keywords search," IEEE Transactions on Computational Social Systems, 2019.
[6]
W. Yuan, H. H. Nguyen, L. Jiang, Y. Chen, J. Zhao, and H. Yu, "Api recommendation for event-driven android application development," Information and Software Technology, vol. 107, pp. 30--47, 2019.
[7]
C.-Y. Ling, Y.-Z. Zou, Z.-Q. Lin, and B. Xie, "Graph embedding based api graph search and recommendation," Journal of Computer Science and Technology, vol. 34, no. 5, pp. 993--1006, 2019.
[8]
C. Chen, X. Peng, J. Sun, Z. Xing, X. Wang, Y. Zhao, H. Zhang, and W. Zhao, "Generative api usage code recommendation with parameter concretization," Science China Information Sciences, vol. 62, no. 9, p. 192103, 2019.
[9]
A. T. Nguyen, M. Hilton, M. Codoban, H. A. Nguyen, L. Mast, E. Rademacher, T. N. Nguyen, and D. Dig, "Api code recommendation using statistical learning from fine-grained changes," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2016, pp. 511--522.
[10]
A. T. Nguyen and T. N. Nguyen, "Graph-based statistical language model for code," in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1. IEEE, 2015, pp. 858--868.
[11]
X. Liu, L. Huang, and V. Ng, "Effective api recommendation without historical software repositories," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 282--292.
[12]
P. T. Nguyen, J. Di Rocco, D. Di Ruscio, L. Ochoa, T. Degueule, and M. Di Penta, "Focus: A recommender system for mining api function calls and usage patterns," in Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 2019, pp. 1050--1060.
[13]
C. Chen, Z. Xing, Y. Liu, and K. L. X. Ong, "Mining likely analogical apis across third-party libraries via large-scale unsupervised api semantics embedding," IEEE Transactions on Software Engineering, pp. 1--15, 2019.
[14]
X. Ren, J. Sun, Z. Xing, X. Xia, and J. Sun, "Demystify official api usage directives with crowdsourced api misuse scenarios, erroneous code examples and patches," in 2020 IEEE/ACM 42th IEEE International Conference on Software Engineering. IEEE, 2020, pp. 925--936.
[15]
A. R. D'Souza, D. Yang, and C. V. Lopes, "Collective intelligence for smarter api recommendations in python," in 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 2016, pp. 51--60.
[16]
R. Xie, X. Kong, L. Wang, Y. Zhou, and B. Li, "Hirec: Api recommendation using hierarchical context," in 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2019, pp. 369--379.
[17]
M. Gorbovitski, Y. A. Liu, S. D. Stoller, T. Rothamel, and T. K. Tekle, "Alias analysis for optimization of dynamic languages," in Proceedings of the 6th Symposium on Dynamic Languages, 2010, pp. 27--42.
[18]
L. Fritz and J. Hage, "Cost versus precision for approximate typing for python," in Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, 2017, pp. 89--98.
[19]
Z. Xu, X. Zhang, L. Chen, K. Pei, and B. Xu, "Python probabilistic type inference with natural language support," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 607--618.
[20]
M. Salib, "Starkiller: A static type inferencer and compiler for python," Ph.D. dissertation, Massachusetts Institute of Technology, 2004.
[21]
A. Svyatkovskiy, S. K. Deng, S. Fu, and N. Sundaresan, "Intellicode compose: Code generation using transformer," arXiv preprint arXiv:2005.08025, 2020.
[22]
A. Svyatkovskiy, Y. Zhao, S. Fu, and N. Sundaresan, "Pythia: ai-assisted code completion system," in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 2727--2735.
[23]
M. Asaduzzaman, C. K. Roy, K. A. Schneider, and D. Hou, "Cscc: Simple, efficient, context sensitive code completion," in 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE, 2014, pp. 71--80.
[24]
M. Bruch, M. Monperrus, and M. Mezini, "Learning from examples to improve code completion systems," in Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, 2009, pp. 213--222.
[25]
G. A. Kildall, "A unified approach to global program optimization," in Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages, 1973, pp. 194--206.
[26]
M. Rapoport, O. Lhoták, and F. Tip, "Precise data flow analysis in the presence of correlated method calls," in International Static Analysis Symposium. Springer, 2015, pp. 54--71.
[27]
K. D. Cooper, T. J. Harvey, and K. Kennedy, "Iterative data-flow analysis, revisited," Tech. Rep., 2004.
[28]
Y. Bengio, A. Courville, and P. Vincent, "Representation learning: A review and new perspectives," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1798--1828, 2013.
[29]
A. Coates, A. Ng, and H. Lee, "An analysis of single-layer networks in unsupervised feature learning," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 215--223.
[30]
W. Hamilton, Z. Ying, and J. Leskovec, "Inductive representation learning on large graphs," in Advances in neural information processing systems, 2017, pp. 1024--1034.
[31]
Y. Li, L. Xu, F. Tian, L. Jiang, X. Zhong, and E. Chen, "Word embedding revisited: A new representation learning and explicit matrix factorization perspective," in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015, pp. 3650--3656.
[32]
O. Levy and Y. Goldberg, "Neural word embedding as implicit matrix factorization," in Advances in neural information processing systems, 2014, pp. 2177--2185.
[33]
Y.-Y. Lee, H. Ke, T.-Y. Yen, H.-H. Huang, and H.-H. Chen, "Combining and learning word embedding with wordnet for semantic relatedness and similarity measurement," Journal of the Association for Information Science and Technology, vol. 71, no. 6, pp. 657--670, 2020.
[34]
S. Negara, M. Codoban, D. Dig, and R. E. Johnson, "Mining fine-grained code changes to detect unknown change patterns," in Proceedings of the 36th International Conference on Software Engineering, 2014, pp. 803--813.
[35]
M. Dias, A. Bacchelli, G. Gousios, D. Cassou, and S. Ducasse, "Untangling fine-grained code changes," in 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 2015, pp. 341--350.
[36]
B. Fitzgerald, "The transformation of open source software," MIS quarterly, pp. 587--598, 2006.
[37]
D. Spinellis, Z. Kotti, K. Kravvaritis, G. Theodorou, and P. Louridas, "A dataset of enterprise-driven open source software," arXiv preprint arXiv:2002.03927, 2020.
[38]
M. Schäfer, M. Sridharan, J. Dolby, and F. Tip, "Effective smart completion for javascript," Technical Report RC25359, 2013.
[39]
P. Fegade and C. Wimmer, "Scalable pointer analysis of data structures using semantic models," in Proceedings of the 29th International Conference on Compiler Construction, 2020, pp. 39--50.
[40]
M. Hind, "Pointer analysis: Haven't we solved this problem yet?" in Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, 2001, pp. 54--61.

Cited By

View all
  • (2024)FastLog: An End-to-End Method to Efficiently Generate and Insert Logging StatementsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652107(26-37)Online publication date: 11-Sep-2024
  • (2023)PyBartRec: Python API Recommendation with Semantic InformationProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609463(33-43)Online publication date: 4-Aug-2023
  • (2023)Let's Chat to Find the APIs: Connecting Human, LLM and Knowledge Graph through AI ChainProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00075(471-483)Online publication date: 11-Nov-2023
  • Show More Cited By

Index Terms

  1. PyART: Python API Recommendation in Real-Time
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICSE '21: Proceedings of the 43rd International Conference on Software Engineering
        May 2021
        1768 pages
        ISBN:9781450390859

        Sponsors

        Publisher

        IEEE Press

        Publication History

        Published: 05 November 2021

        Check for updates

        Badges

        Author Tags

        1. API recommendation
        2. Python
        3. context analysis
        4. data flow analysis
        5. real-time recommendation

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        ICSE '21
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 276 of 1,856 submissions, 15%

        Upcoming Conference

        ICSE 2025

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)15
        • Downloads (Last 6 weeks)4
        Reflects downloads up to 05 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)FastLog: An End-to-End Method to Efficiently Generate and Insert Logging StatementsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652107(26-37)Online publication date: 11-Sep-2024
        • (2023)PyBartRec: Python API Recommendation with Semantic InformationProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609463(33-43)Online publication date: 4-Aug-2023
        • (2023)Let's Chat to Find the APIs: Connecting Human, LLM and Knowledge Graph through AI ChainProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00075(471-483)Online publication date: 11-Nov-2023
        • (2022)HatCUPProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527901(619-630)Online publication date: 16-May-2022
        • (2022)Discovering repetitive code changes in python ML systemsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510225(736-748)Online publication date: 21-May-2022

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media