skip to main content
10.1109/ASE51524.2021.9678946acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Adversarial attacks to API recommender systems: time to wake up and smell the coffee?

Published: 24 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Recommender systems in software engineering provide developers with a wide range of valuable items to help them complete their tasks. Among others, API recommender systems have gained momentum in recent years as they became more successful at suggesting API calls or code snippets. While these systems have proven to be effective in terms of prediction accuracy, there has been less attention for what concerns such recommenders' resilience against adversarial attempts. In fact, by crafting the recommenders' learning material, e.g., data from large open-source software (OSS) repositories, hostile users may succeed in injecting malicious data, putting at risk the software clients adopting API recommender systems. In this paper, we present an empirical investigation of adversarial machine learning techniques and their possible influence on recommender systems. The evaluation performed on three state-of-the-art API recommender systems reveals a worrying outcome: all of them are not immune to malicious data. The obtained result triggers the need for effective countermeasures to protect recommender systems against hostile attacks disguised in training data.

    References

    [1]
    "dex2jar," library Catalog: tools.kali.org. [Online]. Available: https://tools.kali.org/reverse-engineering/dex2jar
    [2]
    V. W. Anelli, Y. Deldjoo, T. Di Noia, E. Di Sciascio, and F. A. Merra, "Sasha: Semantic-aware shilling attacks on recommender systems exploiting knowledge graphs," in The Semantic Web. Cham: Springer International Publishing, 2020, pp. 307--323.
    [3]
    M. H. Asyrofi, F. Thung, D. Lo, and L. Jiang, "Ausearch: Accurate API usage search in github repositories with type resolution," in 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18--21, 2020, 2020, pp. 637--641. [Online].
    [4]
    M. Backes, S. Bugiel, and E. Derr, "Reliable third-party library detection in android and its security applications," in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS '16. New York, NY, USA: Association for Computing Machinery, 2016, p. 356--367. [Online].
    [5]
    L. Bao, T. B. Le, and D. Lo, "Mining sandboxes: Are we there yet?" in 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), Mar. 2018, pp. 445--455.
    [6]
    B. Basten, M. Hills, P. Klint, D. Landman, A. Shahi, M. J. Steindorfer, and J. J. Vinju, "M3: A General Model for Code Analytics in Rascal," in 1st International Workshop on Software Analytics. Piscataway: IEEE, 2015, pp. 25--28.
    [7]
    P. Bielik and M. Vechev, "Adversarial Robustness for Code," in International Conference on Machine Learning. PMLR, Nov. 2020, pp. 896--907, iSSN: 2640--3498. [Online]. Available: http://proceedings.mlr.press/v119/bielik20a.html
    [8]
    Y. Cao, X. Chen, L. Yao, X. Wang, and W. E. Zhang, "Adversarial Attacks and Detection on Reinforcement Learning-Based Interactive Recommender Systems," in Proceedings of the 43rd ACM SIGIR. Virtual Event China: ACM, Jul. 2020, pp. 1669--1672. [Online].
    [9]
    B. Carbunar and R. Potharaju, "A longitudinal study of the google app market," in 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2015, pp. 242--249.
    [10]
    L. Chen, Y. Ye, and T. Bourlai, "Adversarial Machine Learning in Malware Detection: Arms Race between Evasion Attack and Defense," in 2017 European Intelligence and Security Informatics Conference (EISIC), Sep. 2017, pp. 99--106, 00048.
    [11]
    P.-A. Chirita, W. Nejdl, and C. Zamfir, "Preventing shilling attacks in online recommender systems," in Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, ser. WIDM '05. New York, NY, USA: Association for Computing Machinery, 2005, p. 67--74. [Online].
    [12]
    Y. Deldjoo, T. D. Noia, and F. A. Merra, "A survey on adversarial recommender systems: From attack/defense strategies to generative adversarial networks," ACM Comput. Surv., vol. 54, no. 2, Mar. 2021. [Online].
    [13]
    J. Di Rocco, D. Di Ruscio, C. Di Sipio, P. T. Nguyen, and R. Rubei, "Development of recommendation systems for software engineering: the CROSSMINER experience," Empirical Software Engineering, vol. 26, no. 4, p. 69, 2021. [Online].
    [14]
    C.-I. Fan, H.-W. Hsiao, C.-H. Chou, and Y.-F. Tseng, "Malware detection systems based on api log data mining," in Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference - Volume 03, ser. COMPSAC '15. USA: IEEE Computer Society, 2015, p. 255--260. [Online].
    [15]
    J. Fowkes and C. Sutton, "Parameter-free Probabilistic API Mining Across GitHub," in 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2016, pp. 254--265.
    [16]
    J. Garcia, M. Hammad, and S. Malek, "Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware," ACM Transactions on Software Engineering and Methodology, vol. 26, no. 3, pp. 11:1--11:29, Jan. 2018. [Online].
    [17]
    F. Geiger, I. Malavolta, L. Pascarella, F. Palomba, D. Di Nucci, and A. Bacchelli, "A graph-based dataset of commit history of real-world android apps," in 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), 2018, pp. 30--33.
    [18]
    Q. Gong, J. Zhang, Y. Chen, Q. Li, Y. Xiao, X. Wang, and P. Hui, "Detecting malicious accounts in online developer communities using deep learning," in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, ser. CIKM '19. New York, NY, USA: Association for Computing Machinery, 2019, p. 1251--1260. [Online].
    [19]
    X. Gu, H. Zhang, and S. Kim, "Codekernel: A graph kernel based approach to the selection of API usage examples," in 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11--15, 2019, 2019, pp. 590--601. [Online].
    [20]
    X. Gu, H. Zhang, D. Zhang, and S. Kim, "Deep API Learning," in 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2016, pp. 631--642.
    [21]
    I. Gunes, C. Kaleli, A. Bilge, and H. Polat, "Shilling attacks against recommender systems: A comprehensive survey," Artif. Intell. Rev., vol. 42, no. 4, p. 767--799, Dec. 2014. [Online].
    [22]
    X. He, L. Xu, X. Zhang, R. Hao, Y. Feng, and B. Xu, "PyART: Python API Recommendation in Real-Time," arXiv:2102.04706 [cs], Feb. 2021, arXiv: 2102.04706. [Online]. Available: http://arxiv.org/abs/2102.04706
    [23]
    R. Holmes, R. J. Walker, and G. C. Murphy, "Strathcona example recommendation tool," in Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5--9, 2005, M. Wermelinger and H. C. Gall, Eds. ACM, 2005, pp. 237--240. [Online].
    [24]
    L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar, "Adversarial machine learning," in Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, ser. AISec '11. New York, NY, USA: Association for Computing Machinery, 2011, p. 43--58. [Online].
    [25]
    B. A. Kitchenham, P. Brereton, Z. Li, D. Budgen, and A. J. Burn, "Repeatability of systematic literature reviews," in 15th International Conference on Evaluation & Assessment in Software Engineering, EASE 2011, Durham, UK, 11--12 April 2011, Proceedings, 2011, pp. 46--55. [Online].
    [26]
    W. Koch, A. Chaabane, M. Egele, W. Robertson, and E. Kirda, "Semi-automated discovery of server-based information oversharing vulnerabilities in Android applications," in Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2017. New York, NY, USA: Association for Computing Machinery, Jul. 2017, pp. 147--157. [Online].
    [27]
    S. K. Lam and J. Riedl, "Shilling recommender systems for fun and profit," in Proceedings of the 13th conference on World Wide Web - WWW '04. New York, NY, USA: ACM Press, 2004, p. 393. [Online]. Available: http://portal.acm.org/citation.cfm?doid=988672.988726
    [28]
    S. Lee and S. Ryu, "Adlib: analyzer for mobile ad platform libraries," in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2019. New York, NY, USA: Association for Computing Machinery, Jul. 2019, pp. 262--272. [Online].
    [29]
    S. G. MacDonell, M. J. Shepperd, B. A. Kitchenham, and E. Mendes, "How reliable are systematic reviews in empirical software engineering?" IEEE Trans. Software Eng., vol. 36, no. 5, pp. 676--687, 2010. [Online].
    [30]
    K. Mens and A. Lozano, "Source code-based recommendation systems," in Recommendation Systems in Software Engineering, M. P. Robillard, W. Maalej, R. J. Walker, and T. Zimmermann, Eds. Springer, 2014, pp. 93--130. [Online].
    [31]
    B. Mobasher, R. Burke, R. Bhaumik, and J. J. Sandvig, "Attacks and remedies in collaborative recommendation," IEEE Intelligent Systems, vol. 22, no. 3, pp. 56--63, 2007.
    [32]
    B. Mobasher, R. Burke, R. Bhaumik, and J. J. Sandvig, "Attacks and Remedies in Collaborative Recommendation," IEEE INTELLIGENT SYSTEMS, p. 8, 2007.
    [33]
    L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus, "How Can I Use This Method?" in 37th International Conference on Software Engineering. Piscataway: IEEE, 2015, pp. 880--890.
    [34]
    G. C. Murphy, "Attacking information overload in software development," in IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2009, Corvallis, OR, USA, 20--24 September 2009, Proceedings, 2009, p. 4.
    [35]
    E. R. Murphy-Hill, G. C. Murphy, and W. G. Griswold, "Understanding context: creating a lasting impact in experimental software engineering research," in Proceedings of FoSER 2010, at FSE 2010, Santa Fe, NM, USA, November 7--11, 2010, 2010, pp. 255--258.
    [36]
    A. Narayanan, M. Chandramohan, L. Chen, and Y. Liu, "A multi-view context-aware approach to Android malware detection and malicious code localization," Empirical Software Engineering, vol. 23, no. 3, pp. 1222--1274, Jun. 2018. [Online].
    [37]
    A. M. Nguyen, J. Yosinski, and J. Clune, "Deep neural networks are easily fooled: High confidence predictions for unrecognizable images." in CVPR. IEEE Computer Society, 2015, pp. 427--436. [Online]. Available: http://dblp.uni-trier.de/db/conf/cvpr/cvpr2015.html#NguyenYC15
    [38]
    P. T. Nguyen, J. Di Rocco, C. Di Sipio, D. Di Ruscio, and M. Di Penta, "Recommending api function calls and code snippets to support software development," IEEE Transactions on Software Engineering, pp. 1--1, 2021.
    [39]
    P. T. Nguyen, J. Di Rocco, D. Di Ruscio, and M. Di Penta, "CrossRec: Supporting Software Developers by Recommending Third-party Libraries," Journal of Systems and Software, p. 110460, 2019. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0164121219302341
    [40]
    P. T. Nguyen, J. Di Rocco, D. Di Ruscio, L. Ochoa, T. Degueule, and M. Di Penta, "FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns," in Proceedings of the 41st International Conference on Software Engineering, ser. ICSE '19. Piscataway, NJ, USA: IEEE Press, 2019, pp. 1050--1060. [Online].
    [41]
    P. T. Nguyen, J. Di Rocco, R. Rubei, and D. Di Ruscio, "An automated approach to assess the similarity of GitHub repositories," Softw. Qual. J., vol. 28, no. 2, pp. 595--631, 2020. [Online].
    [42]
    P. T. Nguyen, D. Di Ruscio, J. Di Rocco, C. Di Sipio, and M. Di Penta, "Adversarial machine learning: On the resilience of third-party library recommender systems," in Evaluation and Assessment in Software Engineering, ser. EASE 2021. New York, NY, USA: Association for Computing Machinery, 2021, p. 247--253. [Online].
    [43]
    P. T. Nguyen, C. Di Sipio, J. Di Rocco, D. Di Ruscio, and M. Di Penta, "APIRecSys-AML: Artifact Evaluation," 2021. [Online].
    [44]
    T. T. Nguyen, H. V. Pham, P. M. Vu, and T. T. Nguyen, "Learning API usages from bytecode: a statistical approach," in Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14--22, 2016, 2016, pp. 416--427. [Online].
    [45]
    D. L. Parnas, "Information Distribution Aspects of Design Methodology," Departement of Computer Science, Carnegie Mellon University, Pittsburgh, Tech. Rep., 1971.
    [46]
    L. Ponzanelli, G. Bavota, M. Di Penta, R. Oliveto, and M. Lanza, "Prompter - turning the IDE into a self-confident programming assistant," Empirical Software Engineering, vol. 21, no. 5, pp. 2190--2231, 2016. [Online].
    [47]
    F. Ricci, L. Rokach, and B. Shapira, Introduction to Recommender Systems Handbook. Boston, MA: Springer US, 2011, pp. 1--35. [Online].
    [48]
    M. P. Robillard, "What Makes APIs Hard to Learn? Answers from Developers," IEEE software, vol. 26, no. 6, pp. 27--34, 2009.
    [49]
    M. P. Robillard, W. Maalej, R. J. Walker, and T. Zimmermann, Eds., Recommendation Systems in Software Engineering. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014, [Online]. Available: http://link.springer.com/10.1007/978-3-642-45135-5
    [50]
    M. O. F. Rokon, R. Islam, A. Darki, E. E. Papalexakis, and M. Faloutsos, "Sourcefinder: Finding malware source-code from publicly available repositories in github," in 23rd International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2020, San Sebastian, Spain, October 14--15, 2020, M. Egele and L. Bilge, Eds. USENIX Association, 2020, pp. 149--163. [Online]. Available: https://www.usenix.org/conference/raid2020/presentation/omar
    [51]
    N. Sahavechaphan and K. Claypool, "Xsnippet: Mining for sample code," in Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, ser. OOPSLA '06. New York, NY, USA: Association for Computing Machinery, 2006, p. 413--430.
    [52]
    A. A. Sawant and A. Bacchelli, "fine-GRAPE: fine-grained APi usage extractor - an approach and dataset to investigate API usage," Empirical Software Engineering, vol. 22, no. 3, pp. 1348--1371, Jun. 2017. [Online]. Available: http://link.springer.com/10.1007/s10664-016-9444-6
    [53]
    D. Shriver, S. Elbaum, and M. B. Dwyer, "Reducing DNN Properties to Enable Falsification with Adversarial Attacks," in 43rd International Conference on Software Engineering. IEEE, 2021.
    [54]
    J. Singh and J. Singh, "Detection of malicious software by analyzing the behavioral artifacts using machine learning algorithms," Information and Software Technology, vol. 121, p. 106273, May 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0950584920300239
    [55]
    F. Thung, D. Lo, and J. Lawall, "Automated library recommendation," in 2013 20th Working Conference on Reverse Engineering (WCRE), Oct 2013, pp. 182--191.
    [56]
    J. D. Tygar, "Adversarial machine learning," IEEE Internet Computing, vol. 15, no. 5, pp. 4--6, 2011.
    [57]
    J. Wang, Y. Dang, H. Zhang, K. Chen, T. Xie, and D. Zhang, "Mining Succinct and High-coverage API Usage Patterns from Source Code," in 10th MSR. Piscataway: IEEE, 2013, pp. 319--328.
    [58]
    J. Wang and J. Han, "Bide: efficient mining of frequent closed sequences," in Proceedings. 20th International Conference on Data Engineering, 2004, pp. 79--90.
    [59]
    J. Wang and P. Han, "Adversarial Training-Based Mean Bayesian Personalized Ranking for Recommender System," IEEE Access, vol. 8, pp. 7958--7968, 2020, conference Name: IEEE Access.
    [60]
    D. Wu, D. Gao, R. K. C. Chang, E. He, E. K. T. Cheng, and R. H. Deng, "Understanding open ports in android applications: Discovery, diagnosis, and security assessment," in 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24--27, 2019. The Internet Society, 2019. [Online]. Available: https://bit.ly/3e3enkJ
    [61]
    S. Wu, P. Wang, X. Li, and Y. Zhang, "Effective detection of android malware based on the usage of data flow APIs and machine learning," Information and Software Technology, vol. 75, pp. 17--25, Jul. 2016. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0950584916300386
    [62]
    H. Zhang, M. A. Babar, and P. Tell, "Identifying relevant studies in software engineering," Information and Software Technology, vol. 53, no. 6, pp. 625--637, 2011.
    [63]
    Y. Zhang, Y. Fan, S. Hou, Y. Ye, X. Xiao, P. Li, C. Shi, L. Zhao, and S. Xu, "Cyber-guided Deep Neural Network for Malicious Repository Detection in GitHub," in 2020 IEEE International Conference on Knowledge Graph (ICKG), Aug. 2020, pp. 458--465.
    [64]
    H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei, "MAPO: Mining and Recommending API Usage Patterns," in 23rd European Conference on Object-Oriented Programming. Berlin, Heidelberg: Springer, 2009, pp. 318--343.

    Cited By

    View all
    • (2024)Unveiling Memorization in Code ModelsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639074(1-13)Online publication date: 20-May-2024
    • (2023)An Extensive Study on Adversarial Attack against Pre-trained Models of CodeProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616356(489-501)Online publication date: 30-Nov-2023
    • (2023)Fitting missing API puzzles with machine translation techniquesExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119477216:COnline publication date: 15-Apr-2023
    • Show More Cited By

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASE '21: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering
    November 2021
    1446 pages
    ISBN:9781665403375

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    IEEE Press

    Publication History

    Published: 24 June 2022

    Check for updates

    Author Tags

    1. API mining
    2. adversarial attacks
    3. adversarial machine learning
    4. recommender systems

    Qualifiers

    • Research-article

    Conference

    ASE '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 82 of 337 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Unveiling Memorization in Code ModelsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639074(1-13)Online publication date: 20-May-2024
    • (2023)An Extensive Study on Adversarial Attack against Pre-trained Models of CodeProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616356(489-501)Online publication date: 30-Nov-2023
    • (2023)Fitting missing API puzzles with machine translation techniquesExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119477216:COnline publication date: 15-Apr-2023
    • (2022)Automating the design of recommender systemsProceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings10.1145/3550356.3552376(233-236)Online publication date: 23-Oct-2022
    • (2022)You see what I want you to see: poisoning vulnerabilities in neural code searchProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549153(1233-1245)Online publication date: 7-Nov-2022
    • (2022)Natural attack for pre-trained models of codeProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510146(1482-1493)Online publication date: 21-May-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media