research-article

PyART: Python API Recommendation in Real-Time

Authors:

Baowen XuAuthors Info & Claims

ICSE '21: Proceedings of the 43rd International Conference on Software Engineering

Pages 1634 - 1645

https://doi.org/10.1109/ICSE43902.2021.00145

Published: 05 November 2021 Publication History

Abstract

API recommendation in real-time is challenging for dynamic languages like Python. Many existing API recommendation techniques are highly effective, but they mainly support static languages. A few Python IDEs provide API recommendation functionalities based on type inference and training on a large corpus of Python libraries and third-party libraries. As such, they may fail to recommend or make poor recommendations when type information is missing or target APIs are project-specific. In this paper, we propose a novel approach, PyART, to recommending APIs for Python programs in real-time. It features a light-weight analysis to derive so-called optimistic data-flow, which is neither sound nor complete, but simulates the local data-flow information humans can derive. It extracts three kinds of features: data-flow, token similarity, and token co-occurrence, in the context of the program point where a recommendation is solicited. A predictive model is trained on these features using the Random Forest algorithm. Evaluation on 8 popular Python projects demonstrates that PyART can provide effective API recommendations. When historic commits can be leveraged, which is the target scenario of a state-of-the-art tool ARIREC, our average top-1 accuracy is over 50% and average top-10 accuracy over 70%, outperforming APIREC and Intellicode (i.e., the recommendation component in Visual Studio) by 28.48%-39.05% for top-1 accuracy and 24.41%-30.49% for top-10 accuracy. In other applications such as when historic comments are not available and cross-project recommendation, PyART also shows better overall performance. The time to make a recommendation is less than a second on average, satisfying the real-time requirement.

References

[1]

Q. Huang, X. Xia, Z. Xing, D. Lo, and X. Wang, "Api method recommendation without worrying about the task-api knowledge gap," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 2018, pp. 293--304.

Digital Library

[2]

W. Xiong, Z. Lu, B. Li, B. Hang, and Z. Wu, "Automating smart recommendation from natural language api descriptions via representation learning," Future Generation Computer Systems, vol. 87, pp. 382--391, 2018.

[3]

M. M. Rahman, C. K. Roy, and D. Lo, "Rack: Automatic api recommendation using crowdsourced knowledge," in 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1. IEEE, 2016, pp. 349--359.

[4]

X. Sun, C. Xu, B. Li, Y. Duan, and X. Lu, "Enabling feature location for api method recommendation and usage location," IEEE Access, vol. 7, pp. 49 872-49 881, 2019.

[5]

L. Qi, Q. He, F. Chen, W. Dou, S. Wan, X. Zhang, and X. Xu, "Finding all you need: Web apis recommendation in web of things through keywords search," IEEE Transactions on Computational Social Systems, 2019.

[6]

W. Yuan, H. H. Nguyen, L. Jiang, Y. Chen, J. Zhao, and H. Yu, "Api recommendation for event-driven android application development," Information and Software Technology, vol. 107, pp. 30--47, 2019.

[7]

C.-Y. Ling, Y.-Z. Zou, Z.-Q. Lin, and B. Xie, "Graph embedding based api graph search and recommendation," Journal of Computer Science and Technology, vol. 34, no. 5, pp. 993--1006, 2019.

[8]

C. Chen, X. Peng, J. Sun, Z. Xing, X. Wang, Y. Zhao, H. Zhang, and W. Zhao, "Generative api usage code recommendation with parameter concretization," Science China Information Sciences, vol. 62, no. 9, p. 192103, 2019.

[9]

A. T. Nguyen, M. Hilton, M. Codoban, H. A. Nguyen, L. Mast, E. Rademacher, T. N. Nguyen, and D. Dig, "Api code recommendation using statistical learning from fine-grained changes," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2016, pp. 511--522.

Digital Library

[10]

A. T. Nguyen and T. N. Nguyen, "Graph-based statistical language model for code," in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1. IEEE, 2015, pp. 858--868.

Digital Library

[11]

X. Liu, L. Huang, and V. Ng, "Effective api recommendation without historical software repositories," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 282--292.

Digital Library

[12]

P. T. Nguyen, J. Di Rocco, D. Di Ruscio, L. Ochoa, T. Degueule, and M. Di Penta, "Focus: A recommender system for mining api function calls and usage patterns," in Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 2019, pp. 1050--1060.

Digital Library

[13]

C. Chen, Z. Xing, Y. Liu, and K. L. X. Ong, "Mining likely analogical apis across third-party libraries via large-scale unsupervised api semantics embedding," IEEE Transactions on Software Engineering, pp. 1--15, 2019.

[14]

X. Ren, J. Sun, Z. Xing, X. Xia, and J. Sun, "Demystify official api usage directives with crowdsourced api misuse scenarios, erroneous code examples and patches," in 2020 IEEE/ACM 42th IEEE International Conference on Software Engineering. IEEE, 2020, pp. 925--936.

Digital Library

[15]

A. R. D'Souza, D. Yang, and C. V. Lopes, "Collective intelligence for smarter api recommendations in python," in 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 2016, pp. 51--60.

[16]

R. Xie, X. Kong, L. Wang, Y. Zhou, and B. Li, "Hirec: Api recommendation using hierarchical context," in 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2019, pp. 369--379.

[17]

M. Gorbovitski, Y. A. Liu, S. D. Stoller, T. Rothamel, and T. K. Tekle, "Alias analysis for optimization of dynamic languages," in Proceedings of the 6th Symposium on Dynamic Languages, 2010, pp. 27--42.

Digital Library

[18]

L. Fritz and J. Hage, "Cost versus precision for approximate typing for python," in Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, 2017, pp. 89--98.

Digital Library

[19]

Z. Xu, X. Zhang, L. Chen, K. Pei, and B. Xu, "Python probabilistic type inference with natural language support," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 607--618.

Digital Library

[20]

M. Salib, "Starkiller: A static type inferencer and compiler for python," Ph.D. dissertation, Massachusetts Institute of Technology, 2004.

[21]

A. Svyatkovskiy, S. K. Deng, S. Fu, and N. Sundaresan, "Intellicode compose: Code generation using transformer," arXiv preprint arXiv:2005.08025, 2020.

Digital Library

[22]

A. Svyatkovskiy, Y. Zhao, S. Fu, and N. Sundaresan, "Pythia: ai-assisted code completion system," in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 2727--2735.

Digital Library

[23]

M. Asaduzzaman, C. K. Roy, K. A. Schneider, and D. Hou, "Cscc: Simple, efficient, context sensitive code completion," in 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE, 2014, pp. 71--80.

Digital Library

[24]

M. Bruch, M. Monperrus, and M. Mezini, "Learning from examples to improve code completion systems," in Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, 2009, pp. 213--222.

Digital Library

[25]

G. A. Kildall, "A unified approach to global program optimization," in Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages, 1973, pp. 194--206.

Digital Library

[26]

M. Rapoport, O. Lhoták, and F. Tip, "Precise data flow analysis in the presence of correlated method calls," in International Static Analysis Symposium. Springer, 2015, pp. 54--71.

[27]

K. D. Cooper, T. J. Harvey, and K. Kennedy, "Iterative data-flow analysis, revisited," Tech. Rep., 2004.

[28]

Y. Bengio, A. Courville, and P. Vincent, "Representation learning: A review and new perspectives," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1798--1828, 2013.

Digital Library

[29]

A. Coates, A. Ng, and H. Lee, "An analysis of single-layer networks in unsupervised feature learning," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 215--223.

[30]

W. Hamilton, Z. Ying, and J. Leskovec, "Inductive representation learning on large graphs," in Advances in neural information processing systems, 2017, pp. 1024--1034.

Digital Library

[31]

Y. Li, L. Xu, F. Tian, L. Jiang, X. Zhong, and E. Chen, "Word embedding revisited: A new representation learning and explicit matrix factorization perspective," in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015, pp. 3650--3656.

Digital Library

[32]

O. Levy and Y. Goldberg, "Neural word embedding as implicit matrix factorization," in Advances in neural information processing systems, 2014, pp. 2177--2185.

Digital Library

[33]

Y.-Y. Lee, H. Ke, T.-Y. Yen, H.-H. Huang, and H.-H. Chen, "Combining and learning word embedding with wordnet for semantic relatedness and similarity measurement," Journal of the Association for Information Science and Technology, vol. 71, no. 6, pp. 657--670, 2020.

Digital Library

[34]

S. Negara, M. Codoban, D. Dig, and R. E. Johnson, "Mining fine-grained code changes to detect unknown change patterns," in Proceedings of the 36th International Conference on Software Engineering, 2014, pp. 803--813.

Digital Library

[35]

M. Dias, A. Bacchelli, G. Gousios, D. Cassou, and S. Ducasse, "Untangling fine-grained code changes," in 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 2015, pp. 341--350.

[36]

B. Fitzgerald, "The transformation of open source software," MIS quarterly, pp. 587--598, 2006.

[37]

D. Spinellis, Z. Kotti, K. Kravvaritis, G. Theodorou, and P. Louridas, "A dataset of enterprise-driven open source software," arXiv preprint arXiv:2002.03927, 2020.

Digital Library

[38]

M. Schäfer, M. Sridharan, J. Dolby, and F. Tip, "Effective smart completion for javascript," Technical Report RC25359, 2013.

[39]

P. Fegade and C. Wimmer, "Scalable pointer analysis of data structures using semantic models," in Proceedings of the 29th International Conference on Compiler Construction, 2020, pp. 39--50.

Digital Library

[40]

M. Hind, "Pointer analysis: Haven't we solved this problem yet?" in Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, 2001, pp. 54--61.

Digital Library

Cited By

Xie XCai ZChen SXuan JChristakis MPradel M(2024)FastLog: An End-to-End Method to Efficiently Generate and Insert Logging StatementsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652107(26-37)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652107
Li KTang XLi FZhou HYe CZhang W(2023)PyBartRec: Python API Recommendation with Semantic InformationProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609463(33-43)Online publication date: 4-Aug-2023
https://dl.acm.org/doi/10.1145/3609437.3609463
Huang QWan ZXing ZWang CChen JXu XLu QBissyandé TKlein JBird CSarro F(2023)Let's Chat to Find the APIs: Connecting Human, LLM and Knowledge Graph through AI ChainProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00075(471-483)Online publication date: 11-Nov-2023
https://dl.acm.org/doi/10.1109/ASE56229.2023.00075
Show More Cited By

Index Terms

PyART: Python API Recommendation in Real-Time
1. Software and its engineering
  1. Software creation and management
  2. Software notations and tools
    1. General programming languages
    2. Software configuration management and version control systems

Index terms have been assigned to the content through auto-classification.

Recommendations

API method recommendation without worrying about the task-API knowledge gap
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Developers often need to search for appropriate APIs for their programming tasks. Although most libraries have API reference documentation, it is not easy to find appropriate APIs due to the lexical gap and knowledge gap between the natural language ...
PyART: Python API recommendation in real-time
ICSE '21: Proceedings of the 43rd International Conference on Software Engineering: Companion Proceedings

This is the research artifact of the paper titled 'PyART: Python API Recommendation in Real-Time'. PyART is a real-time API recommendation tool for Python, which includes two main functions: data-flow analysis and real-time API recommendation for both ...
PyBartRec: Python API Recommendation with Semantic Information
Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

API recommendation has been widely used to enhance developers’ efficiency in software development. However, existing API recommendation methods for dynamic languages such as Python usually suffer from the limitations of incorrect type inference and lack ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '21: Proceedings of the 43rd International Conference on Software Engineering

May 2021

1768 pages

ISBN:9781450390859

Sponsors

Publisher

IEEE Press

Publication History

Published: 05 November 2021

Check for updates

Badges

Artifacts Evaluated & Functional / v1.1

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICSE '21

Sponsor:

SIGSOFT

ICSE '21: 43rd International Conference on Software Engineering

May 22 - 30, 2021

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
61
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xie XCai ZChen SXuan JChristakis MPradel M(2024)FastLog: An End-to-End Method to Efficiently Generate and Insert Logging StatementsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652107(26-37)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652107
Li KTang XLi FZhou HYe CZhang W(2023)PyBartRec: Python API Recommendation with Semantic InformationProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609463(33-43)Online publication date: 4-Aug-2023
https://dl.acm.org/doi/10.1145/3609437.3609463
Huang QWan ZXing ZWang CChen JXu XLu QBissyandé TKlein JBird CSarro F(2023)Let's Chat to Find the APIs: Connecting Human, LLM and Knowledge Graph through AI ChainProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00075(471-483)Online publication date: 11-Nov-2023
https://dl.acm.org/doi/10.1109/ASE56229.2023.00075
Zhu HHe XXu LRastogi ATufano RBavota GArnaoudova VHaiduc S(2022)HatCUPProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527901(619-630)Online publication date: 16-May-2022
https://dl.acm.org/doi/10.1145/3524610.3527901
Dilhara MKetkar ASannidhi NDig DDwyer MDamian DZeller A(2022)Discovering repetitive code changes in python ML systemsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510225(736-748)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510225

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents