survey

A Systematic Review of Automated Query Reformulations in Source Code Search

Authors:

Mohammad Masudur Rahman,

Chanchal K. RoyAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 32, Issue 6

Article No.: 159, Pages 1 - 79

https://doi.org/10.1145/3607179

Published: 28 September 2023 Publication History

Abstract

Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced developers often fail to choose appropriate queries, which leads to costly trials and errors during a code search. Over the years, many studies have attempted to reformulate the ad hoc queries from developers to support them. In this systematic literature review, we carefully select 70 primary studies on query reformulations from 2,970 candidate studies, perform an in-depth qualitative analysis (e.g., Grounded Theory), and then answer seven research questions with major findings. First, to date, eight major methodologies (e.g., term weighting, term co-occurrence analysis, thesaurus lookup) have been adopted to reformulate queries. Second, the existing studies suffer from several major limitations (e.g., lack of generalizability, the vocabulary mismatch problem, subjective bias) that might prevent their wide adoption. Finally, we discuss the best practices and future opportunities to advance the state of research in search query reformulations.

References

[1]

Google. n.d. Stop-Words. Retrieved July 10, 2023 from https://code.google.com/archive/p/stop-words.

[2]

Oracle. 2015. Java Language Keywords. Retrieved July 10, 2023 from https://docs.oracle.com/javase/tutorial/java/nutsandbolts/_keywords.html.

[3]

TechRepublic. 2017. Report: Software Failure Caused $1.7 Trillion in Financial Losses in 2017. Retrieved July 10, 2023 from https://tek.io/2FBNl2i.

[4]

Reuters. 2018. Boeing Eyes Lion Air Crash Software Upgrade in 6 to 8 Weeks. Retrieved July 10, 2023 from https://tinyurl.com/c3f2rsw3.

[5]

Medium. 2019. The 737Max and Why Software Engineers Might Want to Pay Attention. Retrieved July 10, 2023 from https://bit.ly/2CmeTqB.

[6]

Apache Lucene. 2019. Apache Lucene Core. Retrieved July 10, 2023 from https://lucene.apache.org/core.

[7]

National Post. 2019. Boeing 737 Jets Grounded Globally as Officials Investigate Technical Issues Behind Fatal Crash. Retrieved July 10, 2023 from https://goo.gl/ieBgYN.

[8]

Tabnine. 2019. Codota Code Search. Retrieved July 10, 2023 from https://www.codota.com/code.

[9]

GitHub. 2019. GitHub Code Search. Retrieved July 10, 2023 from https://github.com/search.

[10]

National Post. 2019. Here’s the Terrifying Reason Boeing’s 737 MAX 8 Is Grounded Across the Globe. Retrieved July 10, 2023 from https://goo.gl/GwXv6H.

[11]

GitHub. 2023. Replication Package: A Systematic Review of Automated Query Reformulations in Source Code Search. Retrieved July 10, 2023 from https://bit.ly/3eccmlZ.

[12]

G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. 2002. Recovering traceability links between code and documentation. TSE 28, 10 (2002), 970–983.

Digital Library

[13]

J. Anvik, L. Hiew, and G. C. Murphy. 2005. Coping with an open bug repository. In Proceedings of OOPSLA/Eclipse. 35–39.

Digital Library

[14]

M. Asaduzzaman, C. K. Roy, K. A. Schneider, and D. Hou. 2016. A simple, efficient, context-sensitive approach for code completion. JSEP 28, 7 (2016), 512–541.

[15]

S. Bajracharya and C. Lopes. 2009. Mining search topics from a code search engine usage log. In Proceedings of MSR. 111–120.

Digital Library

[16]

S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. 2006. Sourcerer: A search engine for open source code supporting structure-based search. In Proceedings of OOPSLA-C. 681–682.

Digital Library

[17]

S. K. Bajracharya and C. V. Lopes. 2012. Analyzing and mining a code search engine usage log. EMSE 17, 4–5 (2012), 424–466.

Digital Library

[18]

V. Balachandran. 2015. Query by example in large-scale code repositories. In Proceedings of SANER. 467–476.

Digital Library

[19]

A. Banerjee and R. N Dave. 2004. Validating clusters using the Hopkins statistic. In Proceedings of FUZZY, Vol. 1. 149–153.

[20]

B. Bassett and N. A. Kraft. 2013. Structural information based term weighting in text retrieval for feature location. In Proceedings of ICPC. 133–141.

[21]

R. Blanco and C. Lioma. 2012. Graph-based term weighting for information retrieval. Inf. Retr. 15, 1 (2012), 54–92.

Digital Library

[22]

D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993–1022.

[23]

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. 2016. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016).

[24]

A. Bosu. 2014. Characteristics of the vulnerable code changes identified through peer code review. In Proceedings of ICSE-C (ICSE Companion’14). 736–738.

Digital Library

[25]

A. Bosu, J. C. Carver, M. Hafiz, P. Hilley, and D. Janni. 2014. Identifying the characteristics of vulnerable code changes: An empirical study. In Proceedings of FSE. 257–268.

Digital Library

[26]

J. Brandt, P. J. Guo, J. Lewenstein, M. Dontcheva, and S. R. Klemmer. 2009. Two studies of opportunistic programming: Interleaving web foraging, learning, and writing code. In Proceedings of SIGCHI. 1589–1598.

Digital Library

[27]

S. Brin and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 1–7 (1998), 107–117.

Digital Library

[28]

D. Cai, C. J. van Rijsbergen, and J. M. Jose. 2001. Automatic query expansion based on divergence. In Proceedings of CIKM. 419–426.

Digital Library

[29]

K. Cao, C. Chen, S. Baltes, C. Treude, and X. Chen. 2021. Automated query reformulation for efficient search based on query logs from Stack Overflow. In Proceedings of ICSE. 13.

Digital Library

[30]

D. Carmel and E Yom-Tov. 2010. Estimating the Query Difficulty for Information Retrieval. Morgan & Claypool.

[31]

D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. 2006. What makes a query difficult? In Proceedings of SIGIR. 390–397.

Digital Library

[32]

C. Carpineto, R. de Mori, G. Romano, and B. Bigi. 2001. An information-theoretic approach to automatic query expansion. ACM Trans. Inf. Syst. 19, 1 (2001), 1–27.

Digital Library

[33]

C. Carpineto and G. Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44, 1 (2012), Article 1, 50 pages.

Digital Library

[34]

W. Chan, H. Cheng, and D. Lo. 2012. Searching connected API subgraph via text phrases. In Proceedings of FSE. Article 10, 11 pages.

Digital Library

[35]

O. Chaparro, J. M. Florez, and A. Marcus. 2017. Using observed behavior to reformulate queries during text retrieval-based bug localization. In Proceedings of ICSME. 376–387.

[36]

O. Chaparro, J. M. Florez, U. Singh, and A. Marcus. 2019. Reformulating queries for duplicate bug report detection. In Proceedings of SANER. 12.

[37]

O. Chaparro, J. Lu, F. Zampetti, L. Moreno, M Di Penta, A. Marcus, G. Bavota, and V. Ng. 2017. Detecting missing information in bug descriptions. In Proceedings of ESEC/FSE. 396–407.

Digital Library

[38]

O. Chaparro and A. Marcus. 2016. On the reduction of verbose queries in text retrieval based software maintenance. In Proceedings of ICSE-C. 716–718.

Digital Library

[39]

S. Chatterjee, S. Juvekar, and K. Sen. 2009. SNIFF: A search engine for Java using free-form queries. In Proceedings of FASE. 385–400.

Digital Library

[40]

C. Chen, Z. Xing, and Y. Liu. 2019. What’s Spain’s Paris? Mining analogical libraries from Q&A discussions. EMSE 24, 3 (2019), 1155–1194.

Digital Library

[41]

J. Cordeiro, B. Antunes, and P. Gomes. 2012. Context-based recommendation to support problem solving in software development. In Proceedings of RSSE. 85–89.

[42]

R. F. G. Da Silva, C. K. Roy, M. M. Rahman, K. Schneider, K. Paixo, and M. Maia. 2019. Recommending comprehensive solutions for programming tasks by mining crowd knowledge. In Proceedings of ICPC. 358–368.

Digital Library

[43]

B. Dagenais and M. P. Robillard. 2012. Recovering traceability links between an API and its learning resources. In Proceedings of ICSE. 47–57.

[44]

T. Dietrich, J. Cleland-Huang, and Y. Shin. 2013. Learning effective query transformations for enhanced requirements trace retrieval. In Proceedings of ASE. 586–591.

Digital Library

[45]

B. Dit, M. Revelle, M. Gethers, and D. Poshyvanyk. 2013. Feature location in source code: A taxonomy and survey. JSEP 25, 1 (2013), 53–95.

[46]

N. Dourdas, X. Zhu, N. Maiden, S. Jones, and K. Zachos. 2006. Discovering remote software services that satisfy requirements: Patterns for query reformulation. In Advanced Information Systems Engineering. Lecture Notes in Computer Science, Vol. 4001. Springer, 239–254.

[47]

Brian P. Eddy, Nicholas A. Kraft, and Jeff Gray. 2018. Impact of structural weighting on a latent Dirichlet allocation based feature location technique. JSEP 30, 1 (2018), e1892.

[48]

F. Ensan, E. Bagheri, and M. Kahani. 2007. The application of users’ collective experience for crafting suitable search engine query recommendations. In Proceedings of CNSR. 148–156.

Digital Library

[49]

E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker. 2009. Mining source code to automatically split identifiers for software analysis. In Proceedings of MSR. 71–80.

Digital Library

[50]

L. Favre. 2008. Modernizing software & system engineering processes. In Proceedings of ICSENG. 442–447.

Digital Library

[51]

R. Feldt and A. Magazinius. 2010. Validity threats in empirical software engineering Research—An initial survey. In Proceedings of SEKE. 374–379.

[52]

R. Fisher. 1955. Statistical methods and scientific induction. J. R. Stat. Soc. Series B Methodol. 17, 1 (1955), 69–78.

[53]

G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. 1987. The vocabulary problem in human-system communication. Commun. ACM 30, 11 (1987), 964–971.

Digital Library

[54]

G. Gay, S. Haiduc, A. Marcus, and T. Menzies. 2009. On the use of relevance feedback in IR-based concept location. In Proceedings of ICSM. 351–360.

[55]

X. Ge, D. C. Shepherd, K. Damevski, and E. Murphy-Hill. 2017. Design and evaluation of a multi-recommendation system for local code search. J. Vis. Lang. Comput. 39 (2017), 1–9.

Digital Library

[56]

M. Ghafari and H. Moradi. 2017. A framework for classifying and comparing source code recommendation systems. In Proceedings of SANER. 555–556.

[57]

M. Gibiec, A. Czauderna, and J. Cleland-Huang. 2010. Towards mining replacement queries for hard-to-retrieve traces. In Proceedings of ASE. 245–254.

Digital Library

[58]

B. G. Glaser and A. L. Strauss. 1967. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Publishing, Chicago, IL.

[59]

R. L. Glass. 2001. Frequently forgotten fundamental facts about software engineering. IEEE Softw. 18, 3 (2001), 112–111.

Digital Library

[60]

T. Gvero and V. Kuncak. 2015. Interactive synthesis using free-form queries. In Proceedings of ICSE. 689–692.

[61]

S. Haiduc. 2011. Automatically detecting the quality of the query and its implications in IR-based concept location. In Proceedings of ASE. 637–640.

Digital Library

[62]

Sonia Haiduc. 2013. Supporting Text Retrieval Query Formulation in Software Engineering. Ph.D. dissertation. Wayne State University.

[63]

S. Haiduc, G. Bavota, A. Marcus, R. Oliveto, A De Lucia, and T. Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In Proceedings of ICSE. 842–851.

[64]

S. Haiduc, G. Bavota, R. Oliveto, A De Lucia, and A. Marcus. 2012. Automatic query performance assessment during the retrieval of software artifacts. In Proceedings of ASE. 90–99.

Digital Library

[65]

S. Haiduc and A. Marcus. 2011. On the effect of the query in IR-based concept location. In Proceedings of ICPC. 234–237.

Digital Library

[66]

S. Hanneman. 2008. Design, analysis, and interpretation of method-comparison studies. AACN Adv. Crit. Care 19 (2008), 223–234.

[67]

D. Harman. 1992. Relevance feedback revisited. In Proceedings of SIGIR. 1–10.

Digital Library

[68]

J. H. Hayes, A. Dekhtyar, and S. K. Sundaram. 2006. Advancing candidate link generation for requirements tracing: The study of methods. TSE 32, 1 (2006), 4–19.

Digital Library

[69]

V. J. Hellendoorn and P. Devanbu. 2017. Are deep neural networks the best choice for modeling source code? In Proceedings of ESEC/FSE. 763–773.

Digital Library

[70]

Emily Hill. 2010. Integrating Natural Language and Program Structure Information to Improve Software Search and Exploration. Ph.D. dissertation. University of Delaware.

Digital Library

[71]

E. Hill, L. Pollock, and K. Vijay-Shanker. 2009. Automatically capturing source code context of NL-queries for software maintenance and reuse. In Proceedings of ICSE. 232–242.

Digital Library

[72]

E. Hill, L. Pollock, and K. Vijay-Shanker. 2011. Improving source code search with natural language phrasal representations of method signatures. In Proceedings of ASE. 524–527.

Digital Library

[73]

E. Hill, S. Rao, and A. Kak. 2012. On the use of stemming for concern location and bug localization in Java. In Proceedings of SCAM. 184–193.

Digital Library

[74]

R. Holmes and G. C. Murphy. 2005. Using structural context to recommend source code examples. In Proceedings of ICSE. 117–125.

[75]

M. J. Howard, S. Gupta, L. Pollock, and K. Vijay-Shanker. 2013. Automatically mining software-based, semantically-similar words from comment-code mappings. In Proceedings of MSR. 377–386.

[76]

Q. Huang, X. Xia, Z. Xing, D. Lo, and X. Wang. 2018. API method recommendation without worrying about the task-API knowledge gap. In Proceedings of ASE (ASE’18). 293–304.

Digital Library

[77]

Q. Huang, Y. Yang, and M. Cheng. 2019. Deep learning the semantics of change sequences for query expansion. SPE 49, 11 (2019), 1600–1617.

[78]

Q. Huang, Y. Yang, X. Wang, H. Wan, R. Wang, and G. Wu. 2017. Query expansion via intent predicting. IJSEKE 27, 09n10 (2017), 1591–1601.

[79]

Q. Huang, Y. Yang, X. Zhan, H. Wan, and G. Wu. 2018. Query expansion based on statistical learning from code changes. SPE 48, 7 (2018), 1333–1351.

[80]

S. F. Hussain and G. Bisson. 2010. Text categorization using word similarities based on higher order co-occurrences. In Proceedings of SDM. 1–12.

[81]

S. Jiang, L. Shen, X. Peng, Z. Lv, and W. Zhao. 2015. Understanding developers’ natural language queries with interactive clarification. In Proceedings of SANER. 13–22.

[82]

K. S. Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28, 1 (1972), 11–21.

[83]

N. Juristo. 2013. Towards understanding replication of software engineering experiments. In Proceedings of ESEM. 4–4.

[84]

N. Juristo and O. S. Gómez. 2012. Replication of software engineering experiments. In Empirical Software Engineering and Verification: LASER Summer School 2008–2010. Springer, 60–88.

[85]

D. Kelly and J. Teevan. 2003. Implicit feedback for inferring user preference: A bibliography. SIGIR Forum 37, 2 (2003), 18–28.

Digital Library

[86]

K. Kevic and T. Fritz. 2014. A dictionary to translate change tasks to source code. In Proceedings of MSR. 320–323.

Digital Library

[87]

K. Kevic and T. Fritz. 2014. Automatic search term identification for change tasks. In Proceedings of ICSE. 468–471.

Digital Library

[88]

A. Khanjani and R. Sulaiman. 2011. The aspects of choosing open source versus closed source. In Proceedings of ISCI. 646–649.

[89]

M. Kim and E. Lee. 2018. Are information retrieval-based bug localization techniques trustworthy? In Proceedings of ICSE. 248–249.

Digital Library

[90]

M. Kim and E. Lee. 2019. A novel approach to automatic query reformulation for IR-based bug localization. In Proceedings of SAC. 1752–1759.

Digital Library

[91]

M. Kim and E. Lee. 2020. ManQ: Many-objective optimization-based automatic query reduction for IR-based bug localization. IST 125 (2020), 106334.

[92]

M. Kimmig, M. Monperrus, and M. Mezini. 2011. Querying source code with natural language. In Proceedings of ASE. 376–379.

Digital Library

[93]

B. Kitchenham and P. Brereton. 2013. A systematic review of systematic review process research in software engineering. IST 55, 12 (2013), 2049–2075.

[94]

B. Kitchenham and S. Charters. 2007. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical Report. University of Durham.

[95]

Andrew J. Ko, Brad A. Myers, Michael J. Coblenz, and Htet Htet Aung. 2006. An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. TSE 32, 12 (2006), 971–987.

Digital Library

[96]

P. S. Kochhar, Y. Tian, and D. Lo. 2014. Potential biases in bug localization: Do they matter? In Proceedings of ASE. 803–814.

Digital Library

[97]

R. Lapeña, J. Font, F. Pérez, and C. Cetina. 2016. Improving feature location by transforming the query from natural language into requirements. In Proceedings of SPLC. 362–369.

Digital Library

[98]

V. Lavrenko and W. B. Croft. 2001. Relevance based language models. In Proceedings of SIGIR. 120–127.

Digital Library

[99]

D. Lawrie and D. Binkley. 2018. On the value of bug reports for retrieval-based bug localization. In Proceedings of ICSME. 524–528.

[100]

D. Lawrie, C. Morrell, H. Feild, and D. Binkley. 2006. What’s in a name? A study of identifiers. In Proceedings of ICPC. 3–12.

[101]

B. Lemaire and G. Denhière. 2008. Effects of high-order co-occurrences on word semantic similarities. arXiv:0804.0143 (2008).

[102]

O. A. L. Lemos, S. Bajracharya, J. Ossher, P. C. Masiero, and C. Lopes. 2011. A test-driven approach to code search and its application to the reuse of auxiliary functionality. IST 53, 4 (2011), 294–306.

[103]

O. A. L. Lemos, A. C. de Paula, H. Sajnani, and C. V. Lopes. 2015. Can the use of types and query expansion help improve large-scale code search? In Proceedings of SCAM. 41–50.

[104]

O. A. L. Lemos, A. C. de Paula, F. C. Zanichelli, and C. V. Lopes. 2014. Thesaurus-based automatic query expansion for interface-driven code search. In Proceedings of MSR. 212–221.

Digital Library

[105]

Z. Li, T. Wang, Y. Zhang, Y. Zhan, and G. Yin. 2016. Query reformulation by leveraging crowd wisdom for scenario-based software search. In Proceedings of Internetware. 36–44.

Digital Library

[106]

Z. Li, G. Yin, T. Wang, Y. Zhang, Y. Yu, and H. Wang. 2018. Correlation-based software search by leveraging software term database. Front. Comput. Sci. 12, 5 (2018), 923–938.

Digital Library

[107]

J. Lin and G. C. Murray. 2005. Assessing the term independence assumption in blind relevance feedback. In Proceedings of SIGIR. 635–636.

Digital Library

[108]

Z. Lin, Y. Zou, J. Zhao, and B. Xie. 2017. Improving software text retrieval using conceptual knowledge in source code. In Proceedings of ASE. 123–134.

[109]

E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. 2009. Sourcerer: Mining and searching Internet-scale software repositories. Data Min. Knowl. Discov. 18, 2 (2009), 300–336.

Digital Library

[110]

C. Liu, X. Xia, D. Lo, C. Gao, X. Yang, and J. C. Grundy. 2020. Opportunities and challenges in code search tools. CoRR abs/2011.02297 (2020).

[111]

D. Liu, A. Marcus, D. Poshyvanyk, and V. Rajlich. 2007. Feature location via information retrieval based filtering of a single scenario execution trace. In Proceedings of ASE. 234–243.

Digital Library

[112]

J. Liu, S. Kim, V. Murali, S. Chaudhuri, and S. Chandra. 2019. Neural query expansion for code search. In Proceedings of MAPL. 29–37.

Digital Library

[113]

J. Lu, Y. Wei, X. Sun, B. Li, W. Wen, and C. Zhou. 2018. Interactive query reformulation for source-code search with word relations. IEEE Access 6 (2018).

[114]

Meili Lu, X. Sun, S. Wang, D. Lo, and Yucong Duan. 2015. Query expansion via WordNet for effective code search. In Proceedings of SANER. 545–549.

[115]

A. D. Lucia, R. Oliveto, and P. Sgueglia. 2006. Incremental approach and user feedbacks: A silver bullet for traceability recovery. In Proceedings of ICSM. 299–309.

Digital Library

[116]

F. Lv, H. Zhang, J. Lou, S. Wang, D. Zhang, and J. Zhao. 2015. CodeHow: Effective code search based on API understanding and extended Boolean model. In Proceedings of ASE. 260–270.

Digital Library

[117]

A. Mahmoud and G. Bradshaw. 2015. Estimating semantic relatedness in source code. TOSEM 25, 1 (Dec.2015), Article 10, 35 pages.

Digital Library

[118]

D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. 2005. Jungloid mining: Helping to navigate the API jungle. In Proceedings of PLDI. 48–61.

Digital Library

[119]

C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY.

[120]

A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic. 2004. An information retrieval approach to concept location in source code. In Proceedings of WCRE. 214–223.

[121]

L. Martie, T. D. LaToza, and A. V. D. Hoek. 2015. CodeExchange: Supporting reformulation of Internet-scale code queries in context. In Proceedings of ASE. 24–35.

Digital Library

[122]

P. McCullagh and J. Nelder. 1989. Generalized Linear Models. Chapman & Hall/CRC, Boca Raton, FL.

[123]

C. McMillan, M. Grechanik, D. Poshyvanyk, Q. Xie, and C. Fu. 2011. Portfolio: Finding relevant functions and their usage. In Proceedings of ICSE. 111–120.

Digital Library

[124]

C. Mcmillan, D. Poshyvanyk, M. Grechanik, Q. Xie, and C. Fu. 2013. Portfolio: Searching for relevant functions and their usages in millions of lines of code. TOSEM 22, 4 (2013), Article 37, 30 pages.

Digital Library

[125]

J. Miao, J. X. Huang, and Z. Ye. 2012. Proximity-based Rocchio’s model for pseudo relevance. In Proceedings of SIGIR. 535–544.

Digital Library

[126]

R. Mihalcea and P. Tarau. 2004. TextRank: Bringing order into texts. In Proceedings of EMNLP. 404–411.

[127]

T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013).

[128]

T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 (2018).

[129]

George A. Miller. 1995. WordNet: A lexical database for English. Commun. ACM 38, 11 (1995), 39–41.

Digital Library

[130]

C. Mills, G. Bavota, S. Haiduc, R. Oliveto, A. Marcus, and A. D. Lucia. 2017. Predicting query quality for applications of text retrieval to software engineering tasks. TOSEM 26, 1 (2017), Article 3, 45 pages.

Digital Library

[131]

C. Mills, J. Pantiuchina, E. Parra, G. Bavota, and S. Haiduc. 2018. Are bug reports enough for text retrieval-based bug localization? In Proceedings of ICSME. 381–392.

[132]

L. Moreno, G. Bavota, S. Haiduc, M. Di Penta, R. Oliveto, B. Russo, and A. Marcus. 2015. Query-based configuration of text retrieval solutions for software engineering tasks. In Proceedings of ESEC/FSE. 567–578.

Digital Library

[133]

L. Moreno, J. J. Treadway, A. Marcus, and W. Shen. 2014. On the use of stack traces to improve text retrieval-based bug localization. In Proceedings of ICSME. 151–160.

Digital Library

[134]

S. M. Nasehi, J. Sillito, F. Maurer, and C. Burns. 2012. What makes a good code example?: A study of programming Q & A in StackOverflow. In Proceedings of ICSM. 25–34.

Digital Library

[135]

L. Nie, H. Jiang, Z. Ren, Z. Sun, and X. Li. 2016. Query expansion based on crowd knowledge for code search. TSC 9, 5 (2016), 771–783.

[136]

D. Pal, M. Mitra, and S. Bhattacharya. 2015. Exploring query categorisation for query expansion: A study. CoRR abs/1509.05567 (2015).

[137]

O. Panchenko, J. Karstens, H. Plattner, and A. Zeier. 2011. Precise and scalable querying of syntactical source code patterns using sample code snippets and a database. In Proceedings of ICPC. 41–50.

Digital Library

[138]

C. Parnin and A. Orso. 2011. Are automated debugging techniques actually helping programmers? In Proceedings of ISSTA. 199–209.

Digital Library

[139]

J. W. Paulson, G. Succi, and A. Eberlein. 2004. An empirical study of open-source and closed-source software products. TSE 30, 4 (2004), 246–256.

Digital Library

[140]

F. Pérez, J. Font, L. Arcega, and C. Cetina. 2018. Automatic query reformulations for feature location in a model-based family of software products. DKE 116 (2018), 159–176.

Digital Library

[141]

F. Pérez, J. Font, L. Arcega, and C. Cetina. 2019. Collaborative feature location in models through automatic query expansion. AUSE 26, 1 (2019), 161–202.

[142]

M. Petticrew and H. Roberts. 2005. Systematic Reviews in the Social Sciences: A Practical Guide. Wiley.

[143]

M. F. Porter. 1997. An algorithm for suffix stripping. Program 40, 3 (1997), 313–316.

[144]

M. F. Porter. 2001. Snowball: A Language for Stemming Algorithms. Retrieved July 10, 2023 from http://snowball.tartarus.org/texts/introduction.html.

[145]

F. Pérez, A. C. Marcén, R. Lapeña, and C. Cetina. 2020. Evaluating low-cost in internal crowdsourcing for software engineering: The case of feature location in an industrial environment. IEEE Access 8 (2020), 65745–65757.

[146]

F. Pérez, T. Ziadi, and C. Cetina. 2022. Utilizing automatic query reformulations as genetic operations to improve feature location in software models. TSE 48, 2 (2022), 713–731.

[147]

D. Qiu, B. Li, S. Ji, and H. Leung. 2014. Regression testing of web service: A systematic mapping study. ACM Comput. Surv. 47, 2 (2014), Article 21, 46 pages.

[148]

M. Raghothaman, Y. Wei, and Y. Hamadi. 2016. SWIM: Synthesizing what I mean: Code search and idiomatic snippet synthesis. In Proceedings of ICSE. 357–367.

Digital Library

[149]

M. M. Rahman, J. Barson, S. Paul, J. Kayani, F. A. Lois, S. F. Quezada, C. Parnin, K T. Stolee, and Baishakhi Ray. 2018. Evaluating how developers use general-purpose web-search for code retrieval. In Proceedings of MSR. 465–475.

Digital Library

[150]

M. M. Rahman, F. Khomh, and M. Castelluccio. 2020. Why are some bugs non-reproducible? An empirical investigation using data fusion. In Proceedings of ICSME. 12.

[151]

M. M. Rahman, F. Khomh, S. Yeasmin, and C. K. Roy. 2021. The forgotten role of search queries in IR-based bug localization: An empirical study. EMSE 26 (2021), 116.

[152]

M. M. Rahman and C. K. Roy. 2014. On the use of context in recommending exception handling code examples. In Proceedings of SCAM. 285–294.

Digital Library

[153]

M. M. Rahman and C. K. Roy. 2015. TextRank based search term identification for software change tasks. In Proceedings of SANER. 540–544.

[154]

M. M. Rahman and C. K. Roy. 2016. QUICKAR: Automatic query reformulation for concept location using crowdsourced knowledge. In Proceedings of ASE. 220–225.

Digital Library

[155]

M. M. Rahman and C. K. Roy. 2017. Improved query reformulation for concept location using CodeRank and document structures. In Proceedings of ASE. 428–439.

[156]

M. M. Rahman and C. K. Roy. 2017. STRICT: Information retrieval based search term identification for concept location. In Proceedings of SANER. 79–90.

[157]

M. M. Rahman and C. K. Roy. 2018. Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In Proceedings of ICSME. 516–527.

[158]

M. M. Rahman and C. K. Roy. 2018. Improving IR-based bug localization with context-aware query reformulation. In Proceedings of ESEC/FSE. 621–632.

Digital Library

[159]

M. M. Rahman, C. K. Roy, and D. Lo. 2016. RACK: Automatic API recommendation using crowdsourced knowledge. In Proceedings of SANER. 349–359.

[160]

M. M. Rahman, C. K. Roy, and D. Lo. 2018. Automatic query reformulation for code search using crowdsourced knowledge. EMSE 24 (2018), 1869–1924.

[161]

Peter C. Rigby, Daniel M. German, Laura Cowen, and Margaret-Anne Storey. 2014. Peer review on open-source software projects: Parameters, statistical models, and theory. TOSEM 23, 4 (2014), 1–33.

Digital Library

[162]

S. E. Robertson. 1991. On term selection for query expansion. J. Doc. 46, 4 (1991), 359–364.

Digital Library

[163]

J. J. Rocchio. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System—Experiments in Automatic Document Processing, G. Salton (Ed.). Prentice Hall, Upper Saddle River, NJ, 313–323.

[164]

M. Roldan-Vega, G. Mallet, E. Hill, and J. A. Fails. 2013. CONQUER: A tool for NL-based query refinement and contextualizing code search results. In Proceedings of ICSM. 512–515.

Digital Library

[165]

Julia Rubin and Marsha Chechik. 2013. A survey of feature location techniques. In Domain Engineering, Product Lines, Languages, and Conceptual Models, Iris Reinhartz-Berger, Arnon Sturm, Tony Clark, Sholom Cohen, and Jorn Bettin (Eds.). Springer, 29–58.

[166]

C. Sadowski, K. T. Stolee, and S. Elbaum. 2015. How developers search for code: A case study. In Proceedings of ESEC/FSE. 191–201.

Digital Library

[167]

R. K. Saha, J. Lawall, S. Khurshid, and D. E. Perry. 2014. On the effectiveness of information retrieval based bug localization for C programs. In Proceedings of ICSME. 161–170.

Digital Library

[168]

R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry. 2013. Improving bug localization using structured information retrieval. In Proceedings of ASE. 345–355.

Digital Library

[169]

G. Salton and C. Buckley. 1997. Improving retrieval performance by relevance feedback. In Readings in Information Retrieval, Karen Sparck Jones and Peter Willet (Eds.). Morgan Kaufmann, San Francisco, CA, 355–364.

[170]

G. Salton and M. J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill Computer Science Series. McGraw-Hill, New York, NY.

Digital Library

[171]

G. Salton, A. Wong, and C. S. Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613–620.

Digital Library

[172]

A. Satter and K. Sakib. 2016. A search log mining based query expansion technique to improve effectiveness in code search. In Proceedings of ICCIT. 586–591.

[173]

T. Savage, M. Revelle, and D. Poshyvanyk. 2010. FLAT3: Feature location and textual tracing tool. In Proceedings of ICSE. 255–258.

[174]

G. Scanniello, A. Marcus, and D. Pascale. 2015. Link analysis algorithms for static concept location: An empirical assessment. EMSE 20, 6 (2015), 1666–1720.

Digital Library

[175]

H. A. Shafiq and Z. Arshad. 2014. Automated Debugging and Bug Fixing Solutions: A Systematic Literature Review and Classification. Master’s thesis. Blekinge Institute of Technology.

[176]

D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker. 2007. Using natural language program analysis to locate and understand action-oriented concerns. In Proceedings of ASOD. 212–224.

Digital Library

[177]

David J. Sheskin. 2007. Handbook of Parametric and Nonparametric Statistical Procedures (4th ed.). Chapman & Hall/CRC, Boca Raton, FL.

Digital Library

[178]

Z. Shi, J. Keung, and Q. Song. 2014. An empirical study of BM25 and BM25F based feature location techniques. In Proceedings of InnoSWDev. 106–114.

Digital Library

[179]

A. Shtok, O. Kurland, D. Carmel, F. Raiber, and G. Markovits. 2012. Predicting query performance by query-drift estimation. TOIS 30, 2 (2012), Article 11, 35 pages.

Digital Library

[180]

R. Sirres, T. F. Bissyandé, D. Kim, D. Lo, J. Klein, K. Kim, and Y. L. Traon. 2018. Augmenting and structuring user queries to support efficient free-form code search. EMSE 23 (2018), 2622–2654.

[181]

Bunyamin Sisman, Shayan A. Akbar, and Avinash C. Kak. 2017. Exploiting spatial code proximity and order for improved source code retrieval for bug localization. JSEP 29, 1 (2017), e1805.

[182]

B. Sisman and A. C. Kak. 2012. Incorporating version histories in information retrieval based bug localization. In Proceedings of MSR. 50–59.

[183]

B. Sisman and A. C. Kak. 2013. Assisting code search with automatic query reformulation for bug localization. In Proceedings of MSR. 309–318.

[184]

G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker. 2008. Identifying word relations in software: A comparative study of semantic similarity tools. In Proceedings of ICPC. 123–132.

Digital Library

[185]

K. Stol, P. Ralph, and B. Fitzgerald. 2016. Grounded theory in software engineering research: A critical review and guidelines. In Proceedings of ICSE. 120–131.

Digital Library

[186]

K. T. Stolee, S. Elbaum, and D. Dobos. 2014. Solving the search for source code. TOSEM 23, 3 (2014), Article 26, 45 pages.

Digital Library

[187]

F. Thung, D. Lo, and J. Lawall. 2013. Automated library recommendation. In Proceedings of WCRE. 182–191.

[188]

F. Thung, S. Wang, D. Lo, and J. Lawall. 2013. Automatic recommendation of API methods from feature requests. In Proceedings of ASE. 290–300.

Digital Library

[189]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR 9, 11 (2008), 1–27.

[190]

C. Vassallo, S. Panichella, M. Di Penta, and G. Canfora. 2014. CODES: Mining source code descriptions from developers discussions. In Proceedings of ICPC. 106–109.

Digital Library

[191]

V. Vinayakarao, A. Sarma, R. Purandare, S. Jain, and S. Jain. 2017. ANNE: Improving source code search using entity retrieval approach. In Proceedings of WSDM. 211–220.

Digital Library

[192]

J. Wang, X. Peng, Z. Xing, and W. Zhao. 2013. Improving feature location practice with multi-faceted interactive exploration. In Proceedings of ICSE. 762–771.

[193]

Q. Wang, C. Parnin, and A. Orso. 2015. Evaluating the usefulness of IR-based fault localization techniques. In Proceedings of ISSTA. 1–11.

Digital Library

[194]

S. Wang and D. Lo. 2014. Version history, similar report, and structure: Putting them together for improved bug localization. In Proceedings of ICPC. 53–63.

Digital Library

[195]

S. Wang and D. Lo. 2016. AmaLgam+: Composing rich information sources for accurate bug localization. JSEP 28, 10 (2016), 921–942.

[196]

S. Wang, D. Lo, and L. Jiang. 2014. Active code search: Incorporating user feedback to improve code search relevance. In Proceedings of ASE. 677–682.

Digital Library

[197]

S. Wang, D. Lo, and L. Jiang. 2016. AutoQuery: Automatic construction of dependency queries for code search. ASE 23, 3 (Sept.2016), 393–425.

[198]

T. Wei, Y. Lu, H. Chang, Q. Zhou, and X. Bao. 2015. A semantic approach for text clustering using WordNet and lexical chains. Expert Syst. Appl. 42, 4 (2015), 2264–2275.

Digital Library

[199]

M. Wen, R. Wu, and S. C. Cheung. 2016. Locus: Locating bugs from software changes. In Proceedings of ASE. 262–273.

Digital Library

[200]

L. A. Wilson. 2010. Using ontology fragments in concept location. In Proceedings of ICSM. 1–2.

Digital Library

[201]

C. Wohlin, P. Runeson, M. Hst, M. C. Ohlsson, B. Regnell, and A. Wessln. 2012. Experimentation in Software Engineering. Springer.

[202]

C. P. Wong, Y. Xiong, H. Zhang, D. Hao, L. Zhang, and H. Mei. 2014. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In Proceedings of ICSME. 181–190.

Digital Library

[203]

W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa. 2016. A survey on software fault localization. TSE 42, 8 (2016), 707–740.

Digital Library

[204]

H. Wu and Y. Yang. 2019. Code search based on alteration intent. IEEE Access 7 (2019), 56796–56802.

[205]

M. Wursch, G. Ghezzi, G. Reif, and H. C. Gall. 2010. Supporting developers with natural language queries. In Proceedings of ICSE. 165–174.

Digital Library

[206]

J. Yang and L. Tan. 2012. Inferring semantically related words from software context. In Proceedings of MSR. 161–170.

[207]

J. Yang and L. Tan. 2014. SWordNet: Inferring semantically related words from software context. EMSE 19, 6 (2014), 1856–1886.

Digital Library

[208]

X. Ye, H. Shen, X. Ma, R. Bunescu, and C. Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of ICSE. 404–415.

Digital Library

[209]

K. C. Youm, J. Ahn, J. Kim, and E. Lee. 2015. Bug localization based on code change histories and bug reports. In Proceedings of APSEC. 190–197.

[210]

H. Yu, W. Song, and T. Mine. 2016. APIBook: An effective approach for finding APIs. In Proceedings of Internetware. 45–53.

Digital Library

[211]

T. Yuan, D. Lo, and J. Lawall. 2014. Automated construction of a software-specific word similarity database. In Proceedings of CSMR-WCRE. 44–53.

[212]

S. Zamani, S. Peck Lee, R. Shokripour, and J. Anvik. 2014. A noun-based approach to feature location using time-aware term-weighting. Inf. Softw. Technol. 56, 8 (2014), 991–1011.

[213]

F. Zhang, H. Niu, I. Keivanloo, and Y. Zou. 2018. Expanding queries for code search using semantically related API class-names. TSE 44, 11 (2018), 1070–1082.

[214]

Jie Zhang, XiaoYin Wang, Dan Hao, Bing Xie, Lu Zhang, and Hong Mei. 2015. A survey on bug-report analysis. SCIS 58, 2 (2015), 1–24.

[215]

W. Zhang, Z. Li, Q. Wang, and J. Li. 2019. FineLocator: A novel approach to method-level fine-grained bug localization by query expansion. IST 110 (2019), 121–135.

[216]

Y. Zhang, D. Lo, X. Xia, G. Scanniello, T. B. Le, and J. Sun. 2018. Fusing multi-abstraction vector space models for concern localization. EMSE 23, 4 (2018), 2279–2322.

Digital Library

[217]

J. Zhou, H. Zhang, and D. Lo. 2012. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In Proceedings of ICSE. 14–24.

[218]

W. Zou, D. Lo, Z. Chen, X. Xia, Y. Feng, and B. Xu. 2020. How practitioners perceive automated bug report management techniques. TSE 46, 8 (2020), 836–862.

Cited By

Arwan ARochimah SFatichah C(2023)Feature Location Using Extraction of Code DocumentationProceedings of the 8th International Conference on Sustainable Information Engineering and Technology10.1145/3626641.3627149(481-488)Online publication date: 24-Oct-2023
https://dl.acm.org/doi/10.1145/3626641.3627149
Liu KChen XChen CXie XCui Z(2023)Automated Question Title Reformulation by Mining Modification Logs From Stack OverflowIEEE Transactions on Software Engineering10.1109/TSE.2023.329239949:9(4390-4410)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3292399

Index Terms

A Systematic Review of Automated Query Reformulations in Source Code Search
1. Software and its engineering
  1. Software creation and management
  2. Software notations and tools
    1. Software maintenance tools

Recommendations

Improved query reformulation for concept location using CodeRank and document structures
ASE '17: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering

During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request ...
Neural query expansion for code search
MAPL 2019: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages

Searching repositories of existing source code for code snippets is a key task in software engineering. Over the years, many approaches to this problem have been proposed. One recent tool called NCS, takes in a natural language query and outputs ...
Learning to rank query reformulations
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Query reformulation techniques based on query logs have recently proven to be effective for web queries. However, when initial queries have reasonably good quality, these techniques are often not reliable enough to identify the helpful reformulations ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 32, Issue 6

November 2023

949 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/3625557

Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2023

Online AM: 04 July 2023

Accepted: 23 May 2023

Revised: 09 October 2022

Received: 14 September 2021

Published in TOSEM Volume 32, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey

Funding Sources

Dalhousie University
International Dean’s Scholarship from University of Saskatchewan (2014–2017)
Saskatchewan Innovation & Opportunity Scholarship (2017–2018)
Natural Sciences and Engineering Research Council of Canada (NSERC)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
485
Total Downloads

Downloads (Last 12 months)340
Downloads (Last 6 weeks)27

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Arwan ARochimah SFatichah C(2023)Feature Location Using Extraction of Code DocumentationProceedings of the 8th International Conference on Sustainable Information Engineering and Technology10.1145/3626641.3627149(481-488)Online publication date: 24-Oct-2023
https://dl.acm.org/doi/10.1145/3626641.3627149
Liu KChen XChen CXie XCui Z(2023)Automated Question Title Reformulation by Mining Modification Logs From Stack OverflowIEEE Transactions on Software Engineering10.1109/TSE.2023.329239949:9(4390-4410)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3292399

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents