skip to main content
survey

A Systematic Review of Automated Query Reformulations in Source Code Search

Published: 28 September 2023 Publication History

Abstract

Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced developers often fail to choose appropriate queries, which leads to costly trials and errors during a code search. Over the years, many studies have attempted to reformulate the ad hoc queries from developers to support them. In this systematic literature review, we carefully select 70 primary studies on query reformulations from 2,970 candidate studies, perform an in-depth qualitative analysis (e.g., Grounded Theory), and then answer seven research questions with major findings. First, to date, eight major methodologies (e.g., term weighting, term co-occurrence analysis, thesaurus lookup) have been adopted to reformulate queries. Second, the existing studies suffer from several major limitations (e.g., lack of generalizability, the vocabulary mismatch problem, subjective bias) that might prevent their wide adoption. Finally, we discuss the best practices and future opportunities to advance the state of research in search query reformulations.

References

[1]
Google. n.d. Stop-Words. Retrieved July 10, 2023 from https://code.google.com/archive/p/stop-words.
[2]
Oracle. 2015. Java Language Keywords. Retrieved July 10, 2023 from https://docs.oracle.com/javase/tutorial/java/nutsandbolts/_keywords.html.
[3]
TechRepublic. 2017. Report: Software Failure Caused $1.7 Trillion in Financial Losses in 2017. Retrieved July 10, 2023 from https://tek.io/2FBNl2i.
[4]
Reuters. 2018. Boeing Eyes Lion Air Crash Software Upgrade in 6 to 8 Weeks. Retrieved July 10, 2023 from https://tinyurl.com/c3f2rsw3.
[5]
Medium. 2019. The 737Max and Why Software Engineers Might Want to Pay Attention. Retrieved July 10, 2023 from https://bit.ly/2CmeTqB.
[6]
Apache Lucene. 2019. Apache Lucene Core. Retrieved July 10, 2023 from https://lucene.apache.org/core.
[7]
National Post. 2019. Boeing 737 Jets Grounded Globally as Officials Investigate Technical Issues Behind Fatal Crash. Retrieved July 10, 2023 from https://goo.gl/ieBgYN.
[8]
Tabnine. 2019. Codota Code Search. Retrieved July 10, 2023 from https://www.codota.com/code.
[9]
GitHub. 2019. GitHub Code Search. Retrieved July 10, 2023 from https://github.com/search.
[10]
National Post. 2019. Here’s the Terrifying Reason Boeing’s 737 MAX 8 Is Grounded Across the Globe. Retrieved July 10, 2023 from https://goo.gl/GwXv6H.
[11]
GitHub. 2023. Replication Package: A Systematic Review of Automated Query Reformulations in Source Code Search. Retrieved July 10, 2023 from https://bit.ly/3eccmlZ.
[12]
G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. 2002. Recovering traceability links between code and documentation. TSE 28, 10 (2002), 970–983.
[13]
J. Anvik, L. Hiew, and G. C. Murphy. 2005. Coping with an open bug repository. In Proceedings of OOPSLA/Eclipse. 35–39.
[14]
M. Asaduzzaman, C. K. Roy, K. A. Schneider, and D. Hou. 2016. A simple, efficient, context-sensitive approach for code completion. JSEP 28, 7 (2016), 512–541.
[15]
S. Bajracharya and C. Lopes. 2009. Mining search topics from a code search engine usage log. In Proceedings of MSR. 111–120.
[16]
S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. 2006. Sourcerer: A search engine for open source code supporting structure-based search. In Proceedings of OOPSLA-C. 681–682.
[17]
S. K. Bajracharya and C. V. Lopes. 2012. Analyzing and mining a code search engine usage log. EMSE 17, 4–5 (2012), 424–466.
[18]
V. Balachandran. 2015. Query by example in large-scale code repositories. In Proceedings of SANER. 467–476.
[19]
A. Banerjee and R. N Dave. 2004. Validating clusters using the Hopkins statistic. In Proceedings of FUZZY, Vol. 1. 149–153.
[20]
B. Bassett and N. A. Kraft. 2013. Structural information based term weighting in text retrieval for feature location. In Proceedings of ICPC. 133–141.
[21]
R. Blanco and C. Lioma. 2012. Graph-based term weighting for information retrieval. Inf. Retr. 15, 1 (2012), 54–92.
[22]
D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993–1022.
[23]
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. 2016. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016).
[24]
A. Bosu. 2014. Characteristics of the vulnerable code changes identified through peer code review. In Proceedings of ICSE-C (ICSE Companion’14). 736–738.
[25]
A. Bosu, J. C. Carver, M. Hafiz, P. Hilley, and D. Janni. 2014. Identifying the characteristics of vulnerable code changes: An empirical study. In Proceedings of FSE. 257–268.
[26]
J. Brandt, P. J. Guo, J. Lewenstein, M. Dontcheva, and S. R. Klemmer. 2009. Two studies of opportunistic programming: Interleaving web foraging, learning, and writing code. In Proceedings of SIGCHI. 1589–1598.
[27]
S. Brin and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 1–7 (1998), 107–117.
[28]
D. Cai, C. J. van Rijsbergen, and J. M. Jose. 2001. Automatic query expansion based on divergence. In Proceedings of CIKM. 419–426.
[29]
K. Cao, C. Chen, S. Baltes, C. Treude, and X. Chen. 2021. Automated query reformulation for efficient search based on query logs from Stack Overflow. In Proceedings of ICSE. 13.
[30]
D. Carmel and E Yom-Tov. 2010. Estimating the Query Difficulty for Information Retrieval. Morgan & Claypool.
[31]
D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. 2006. What makes a query difficult? In Proceedings of SIGIR. 390–397.
[32]
C. Carpineto, R. de Mori, G. Romano, and B. Bigi. 2001. An information-theoretic approach to automatic query expansion. ACM Trans. Inf. Syst. 19, 1 (2001), 1–27.
[33]
C. Carpineto and G. Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44, 1 (2012), Article 1, 50 pages.
[34]
W. Chan, H. Cheng, and D. Lo. 2012. Searching connected API subgraph via text phrases. In Proceedings of FSE. Article 10, 11 pages.
[35]
O. Chaparro, J. M. Florez, and A. Marcus. 2017. Using observed behavior to reformulate queries during text retrieval-based bug localization. In Proceedings of ICSME. 376–387.
[36]
O. Chaparro, J. M. Florez, U. Singh, and A. Marcus. 2019. Reformulating queries for duplicate bug report detection. In Proceedings of SANER. 12.
[37]
O. Chaparro, J. Lu, F. Zampetti, L. Moreno, M Di Penta, A. Marcus, G. Bavota, and V. Ng. 2017. Detecting missing information in bug descriptions. In Proceedings of ESEC/FSE. 396–407.
[38]
O. Chaparro and A. Marcus. 2016. On the reduction of verbose queries in text retrieval based software maintenance. In Proceedings of ICSE-C. 716–718.
[39]
S. Chatterjee, S. Juvekar, and K. Sen. 2009. SNIFF: A search engine for Java using free-form queries. In Proceedings of FASE. 385–400.
[40]
C. Chen, Z. Xing, and Y. Liu. 2019. What’s Spain’s Paris? Mining analogical libraries from Q&A discussions. EMSE 24, 3 (2019), 1155–1194.
[41]
J. Cordeiro, B. Antunes, and P. Gomes. 2012. Context-based recommendation to support problem solving in software development. In Proceedings of RSSE. 85–89.
[42]
R. F. G. Da Silva, C. K. Roy, M. M. Rahman, K. Schneider, K. Paixo, and M. Maia. 2019. Recommending comprehensive solutions for programming tasks by mining crowd knowledge. In Proceedings of ICPC. 358–368.
[43]
B. Dagenais and M. P. Robillard. 2012. Recovering traceability links between an API and its learning resources. In Proceedings of ICSE. 47–57.
[44]
T. Dietrich, J. Cleland-Huang, and Y. Shin. 2013. Learning effective query transformations for enhanced requirements trace retrieval. In Proceedings of ASE. 586–591.
[45]
B. Dit, M. Revelle, M. Gethers, and D. Poshyvanyk. 2013. Feature location in source code: A taxonomy and survey. JSEP 25, 1 (2013), 53–95.
[46]
N. Dourdas, X. Zhu, N. Maiden, S. Jones, and K. Zachos. 2006. Discovering remote software services that satisfy requirements: Patterns for query reformulation. In Advanced Information Systems Engineering. Lecture Notes in Computer Science, Vol. 4001. Springer, 239–254.
[47]
Brian P. Eddy, Nicholas A. Kraft, and Jeff Gray. 2018. Impact of structural weighting on a latent Dirichlet allocation based feature location technique. JSEP 30, 1 (2018), e1892.
[48]
F. Ensan, E. Bagheri, and M. Kahani. 2007. The application of users’ collective experience for crafting suitable search engine query recommendations. In Proceedings of CNSR. 148–156.
[49]
E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker. 2009. Mining source code to automatically split identifiers for software analysis. In Proceedings of MSR. 71–80.
[50]
L. Favre. 2008. Modernizing software & system engineering processes. In Proceedings of ICSENG. 442–447.
[51]
R. Feldt and A. Magazinius. 2010. Validity threats in empirical software engineering Research—An initial survey. In Proceedings of SEKE. 374–379.
[52]
R. Fisher. 1955. Statistical methods and scientific induction. J. R. Stat. Soc. Series B Methodol. 17, 1 (1955), 69–78.
[53]
G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. 1987. The vocabulary problem in human-system communication. Commun. ACM 30, 11 (1987), 964–971.
[54]
G. Gay, S. Haiduc, A. Marcus, and T. Menzies. 2009. On the use of relevance feedback in IR-based concept location. In Proceedings of ICSM. 351–360.
[55]
X. Ge, D. C. Shepherd, K. Damevski, and E. Murphy-Hill. 2017. Design and evaluation of a multi-recommendation system for local code search. J. Vis. Lang. Comput. 39 (2017), 1–9.
[56]
M. Ghafari and H. Moradi. 2017. A framework for classifying and comparing source code recommendation systems. In Proceedings of SANER. 555–556.
[57]
M. Gibiec, A. Czauderna, and J. Cleland-Huang. 2010. Towards mining replacement queries for hard-to-retrieve traces. In Proceedings of ASE. 245–254.
[58]
B. G. Glaser and A. L. Strauss. 1967. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Publishing, Chicago, IL.
[59]
R. L. Glass. 2001. Frequently forgotten fundamental facts about software engineering. IEEE Softw. 18, 3 (2001), 112–111.
[60]
T. Gvero and V. Kuncak. 2015. Interactive synthesis using free-form queries. In Proceedings of ICSE. 689–692.
[61]
S. Haiduc. 2011. Automatically detecting the quality of the query and its implications in IR-based concept location. In Proceedings of ASE. 637–640.
[62]
Sonia Haiduc. 2013. Supporting Text Retrieval Query Formulation in Software Engineering. Ph.D. dissertation. Wayne State University.
[63]
S. Haiduc, G. Bavota, A. Marcus, R. Oliveto, A De Lucia, and T. Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In Proceedings of ICSE. 842–851.
[64]
S. Haiduc, G. Bavota, R. Oliveto, A De Lucia, and A. Marcus. 2012. Automatic query performance assessment during the retrieval of software artifacts. In Proceedings of ASE. 90–99.
[65]
S. Haiduc and A. Marcus. 2011. On the effect of the query in IR-based concept location. In Proceedings of ICPC. 234–237.
[66]
S. Hanneman. 2008. Design, analysis, and interpretation of method-comparison studies. AACN Adv. Crit. Care 19 (2008), 223–234.
[67]
D. Harman. 1992. Relevance feedback revisited. In Proceedings of SIGIR. 1–10.
[68]
J. H. Hayes, A. Dekhtyar, and S. K. Sundaram. 2006. Advancing candidate link generation for requirements tracing: The study of methods. TSE 32, 1 (2006), 4–19.
[69]
V. J. Hellendoorn and P. Devanbu. 2017. Are deep neural networks the best choice for modeling source code? In Proceedings of ESEC/FSE. 763–773.
[70]
Emily Hill. 2010. Integrating Natural Language and Program Structure Information to Improve Software Search and Exploration. Ph.D. dissertation. University of Delaware.
[71]
E. Hill, L. Pollock, and K. Vijay-Shanker. 2009. Automatically capturing source code context of NL-queries for software maintenance and reuse. In Proceedings of ICSE. 232–242.
[72]
E. Hill, L. Pollock, and K. Vijay-Shanker. 2011. Improving source code search with natural language phrasal representations of method signatures. In Proceedings of ASE. 524–527.
[73]
E. Hill, S. Rao, and A. Kak. 2012. On the use of stemming for concern location and bug localization in Java. In Proceedings of SCAM. 184–193.
[74]
R. Holmes and G. C. Murphy. 2005. Using structural context to recommend source code examples. In Proceedings of ICSE. 117–125.
[75]
M. J. Howard, S. Gupta, L. Pollock, and K. Vijay-Shanker. 2013. Automatically mining software-based, semantically-similar words from comment-code mappings. In Proceedings of MSR. 377–386.
[76]
Q. Huang, X. Xia, Z. Xing, D. Lo, and X. Wang. 2018. API method recommendation without worrying about the task-API knowledge gap. In Proceedings of ASE (ASE’18). 293–304.
[77]
Q. Huang, Y. Yang, and M. Cheng. 2019. Deep learning the semantics of change sequences for query expansion. SPE 49, 11 (2019), 1600–1617.
[78]
Q. Huang, Y. Yang, X. Wang, H. Wan, R. Wang, and G. Wu. 2017. Query expansion via intent predicting. IJSEKE 27, 09n10 (2017), 1591–1601.
[79]
Q. Huang, Y. Yang, X. Zhan, H. Wan, and G. Wu. 2018. Query expansion based on statistical learning from code changes. SPE 48, 7 (2018), 1333–1351.
[80]
S. F. Hussain and G. Bisson. 2010. Text categorization using word similarities based on higher order co-occurrences. In Proceedings of SDM. 1–12.
[81]
S. Jiang, L. Shen, X. Peng, Z. Lv, and W. Zhao. 2015. Understanding developers’ natural language queries with interactive clarification. In Proceedings of SANER. 13–22.
[82]
K. S. Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28, 1 (1972), 11–21.
[83]
N. Juristo. 2013. Towards understanding replication of software engineering experiments. In Proceedings of ESEM. 4–4.
[84]
N. Juristo and O. S. Gómez. 2012. Replication of software engineering experiments. In Empirical Software Engineering and Verification: LASER Summer School 2008–2010. Springer, 60–88.
[85]
D. Kelly and J. Teevan. 2003. Implicit feedback for inferring user preference: A bibliography. SIGIR Forum 37, 2 (2003), 18–28.
[86]
K. Kevic and T. Fritz. 2014. A dictionary to translate change tasks to source code. In Proceedings of MSR. 320–323.
[87]
K. Kevic and T. Fritz. 2014. Automatic search term identification for change tasks. In Proceedings of ICSE. 468–471.
[88]
A. Khanjani and R. Sulaiman. 2011. The aspects of choosing open source versus closed source. In Proceedings of ISCI. 646–649.
[89]
M. Kim and E. Lee. 2018. Are information retrieval-based bug localization techniques trustworthy? In Proceedings of ICSE. 248–249.
[90]
M. Kim and E. Lee. 2019. A novel approach to automatic query reformulation for IR-based bug localization. In Proceedings of SAC. 1752–1759.
[91]
M. Kim and E. Lee. 2020. ManQ: Many-objective optimization-based automatic query reduction for IR-based bug localization. IST 125 (2020), 106334.
[92]
M. Kimmig, M. Monperrus, and M. Mezini. 2011. Querying source code with natural language. In Proceedings of ASE. 376–379.
[93]
B. Kitchenham and P. Brereton. 2013. A systematic review of systematic review process research in software engineering. IST 55, 12 (2013), 2049–2075.
[94]
B. Kitchenham and S. Charters. 2007. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical Report. University of Durham.
[95]
Andrew J. Ko, Brad A. Myers, Michael J. Coblenz, and Htet Htet Aung. 2006. An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. TSE 32, 12 (2006), 971–987.
[96]
P. S. Kochhar, Y. Tian, and D. Lo. 2014. Potential biases in bug localization: Do they matter? In Proceedings of ASE. 803–814.
[97]
R. Lapeña, J. Font, F. Pérez, and C. Cetina. 2016. Improving feature location by transforming the query from natural language into requirements. In Proceedings of SPLC. 362–369.
[98]
V. Lavrenko and W. B. Croft. 2001. Relevance based language models. In Proceedings of SIGIR. 120–127.
[99]
D. Lawrie and D. Binkley. 2018. On the value of bug reports for retrieval-based bug localization. In Proceedings of ICSME. 524–528.
[100]
D. Lawrie, C. Morrell, H. Feild, and D. Binkley. 2006. What’s in a name? A study of identifiers. In Proceedings of ICPC. 3–12.
[101]
B. Lemaire and G. Denhière. 2008. Effects of high-order co-occurrences on word semantic similarities. arXiv:0804.0143 (2008).
[102]
O. A. L. Lemos, S. Bajracharya, J. Ossher, P. C. Masiero, and C. Lopes. 2011. A test-driven approach to code search and its application to the reuse of auxiliary functionality. IST 53, 4 (2011), 294–306.
[103]
O. A. L. Lemos, A. C. de Paula, H. Sajnani, and C. V. Lopes. 2015. Can the use of types and query expansion help improve large-scale code search? In Proceedings of SCAM. 41–50.
[104]
O. A. L. Lemos, A. C. de Paula, F. C. Zanichelli, and C. V. Lopes. 2014. Thesaurus-based automatic query expansion for interface-driven code search. In Proceedings of MSR. 212–221.
[105]
Z. Li, T. Wang, Y. Zhang, Y. Zhan, and G. Yin. 2016. Query reformulation by leveraging crowd wisdom for scenario-based software search. In Proceedings of Internetware. 36–44.
[106]
Z. Li, G. Yin, T. Wang, Y. Zhang, Y. Yu, and H. Wang. 2018. Correlation-based software search by leveraging software term database. Front. Comput. Sci. 12, 5 (2018), 923–938.
[107]
J. Lin and G. C. Murray. 2005. Assessing the term independence assumption in blind relevance feedback. In Proceedings of SIGIR. 635–636.
[108]
Z. Lin, Y. Zou, J. Zhao, and B. Xie. 2017. Improving software text retrieval using conceptual knowledge in source code. In Proceedings of ASE. 123–134.
[109]
E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. 2009. Sourcerer: Mining and searching Internet-scale software repositories. Data Min. Knowl. Discov. 18, 2 (2009), 300–336.
[110]
C. Liu, X. Xia, D. Lo, C. Gao, X. Yang, and J. C. Grundy. 2020. Opportunities and challenges in code search tools. CoRR abs/2011.02297 (2020).
[111]
D. Liu, A. Marcus, D. Poshyvanyk, and V. Rajlich. 2007. Feature location via information retrieval based filtering of a single scenario execution trace. In Proceedings of ASE. 234–243.
[112]
J. Liu, S. Kim, V. Murali, S. Chaudhuri, and S. Chandra. 2019. Neural query expansion for code search. In Proceedings of MAPL. 29–37.
[113]
J. Lu, Y. Wei, X. Sun, B. Li, W. Wen, and C. Zhou. 2018. Interactive query reformulation for source-code search with word relations. IEEE Access 6 (2018).
[114]
Meili Lu, X. Sun, S. Wang, D. Lo, and Yucong Duan. 2015. Query expansion via WordNet for effective code search. In Proceedings of SANER. 545–549.
[115]
A. D. Lucia, R. Oliveto, and P. Sgueglia. 2006. Incremental approach and user feedbacks: A silver bullet for traceability recovery. In Proceedings of ICSM. 299–309.
[116]
F. Lv, H. Zhang, J. Lou, S. Wang, D. Zhang, and J. Zhao. 2015. CodeHow: Effective code search based on API understanding and extended Boolean model. In Proceedings of ASE. 260–270.
[117]
A. Mahmoud and G. Bradshaw. 2015. Estimating semantic relatedness in source code. TOSEM 25, 1 (Dec.2015), Article 10, 35 pages.
[118]
D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. 2005. Jungloid mining: Helping to navigate the API jungle. In Proceedings of PLDI. 48–61.
[119]
C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY.
[120]
A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic. 2004. An information retrieval approach to concept location in source code. In Proceedings of WCRE. 214–223.
[121]
L. Martie, T. D. LaToza, and A. V. D. Hoek. 2015. CodeExchange: Supporting reformulation of Internet-scale code queries in context. In Proceedings of ASE. 24–35.
[122]
P. McCullagh and J. Nelder. 1989. Generalized Linear Models. Chapman & Hall/CRC, Boca Raton, FL.
[123]
C. McMillan, M. Grechanik, D. Poshyvanyk, Q. Xie, and C. Fu. 2011. Portfolio: Finding relevant functions and their usage. In Proceedings of ICSE. 111–120.
[124]
C. Mcmillan, D. Poshyvanyk, M. Grechanik, Q. Xie, and C. Fu. 2013. Portfolio: Searching for relevant functions and their usages in millions of lines of code. TOSEM 22, 4 (2013), Article 37, 30 pages.
[125]
J. Miao, J. X. Huang, and Z. Ye. 2012. Proximity-based Rocchio’s model for pseudo relevance. In Proceedings of SIGIR. 535–544.
[126]
R. Mihalcea and P. Tarau. 2004. TextRank: Bringing order into texts. In Proceedings of EMNLP. 404–411.
[127]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013).
[128]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 (2018).
[129]
George A. Miller. 1995. WordNet: A lexical database for English. Commun. ACM 38, 11 (1995), 39–41.
[130]
C. Mills, G. Bavota, S. Haiduc, R. Oliveto, A. Marcus, and A. D. Lucia. 2017. Predicting query quality for applications of text retrieval to software engineering tasks. TOSEM 26, 1 (2017), Article 3, 45 pages.
[131]
C. Mills, J. Pantiuchina, E. Parra, G. Bavota, and S. Haiduc. 2018. Are bug reports enough for text retrieval-based bug localization? In Proceedings of ICSME. 381–392.
[132]
L. Moreno, G. Bavota, S. Haiduc, M. Di Penta, R. Oliveto, B. Russo, and A. Marcus. 2015. Query-based configuration of text retrieval solutions for software engineering tasks. In Proceedings of ESEC/FSE. 567–578.
[133]
L. Moreno, J. J. Treadway, A. Marcus, and W. Shen. 2014. On the use of stack traces to improve text retrieval-based bug localization. In Proceedings of ICSME. 151–160.
[134]
S. M. Nasehi, J. Sillito, F. Maurer, and C. Burns. 2012. What makes a good code example?: A study of programming Q & A in StackOverflow. In Proceedings of ICSM. 25–34.
[135]
L. Nie, H. Jiang, Z. Ren, Z. Sun, and X. Li. 2016. Query expansion based on crowd knowledge for code search. TSC 9, 5 (2016), 771–783.
[136]
D. Pal, M. Mitra, and S. Bhattacharya. 2015. Exploring query categorisation for query expansion: A study. CoRR abs/1509.05567 (2015).
[137]
O. Panchenko, J. Karstens, H. Plattner, and A. Zeier. 2011. Precise and scalable querying of syntactical source code patterns using sample code snippets and a database. In Proceedings of ICPC. 41–50.
[138]
C. Parnin and A. Orso. 2011. Are automated debugging techniques actually helping programmers? In Proceedings of ISSTA. 199–209.
[139]
J. W. Paulson, G. Succi, and A. Eberlein. 2004. An empirical study of open-source and closed-source software products. TSE 30, 4 (2004), 246–256.
[140]
F. Pérez, J. Font, L. Arcega, and C. Cetina. 2018. Automatic query reformulations for feature location in a model-based family of software products. DKE 116 (2018), 159–176.
[141]
F. Pérez, J. Font, L. Arcega, and C. Cetina. 2019. Collaborative feature location in models through automatic query expansion. AUSE 26, 1 (2019), 161–202.
[142]
M. Petticrew and H. Roberts. 2005. Systematic Reviews in the Social Sciences: A Practical Guide. Wiley.
[143]
M. F. Porter. 1997. An algorithm for suffix stripping. Program 40, 3 (1997), 313–316.
[144]
M. F. Porter. 2001. Snowball: A Language for Stemming Algorithms. Retrieved July 10, 2023 from http://snowball.tartarus.org/texts/introduction.html.
[145]
F. Pérez, A. C. Marcén, R. Lapeña, and C. Cetina. 2020. Evaluating low-cost in internal crowdsourcing for software engineering: The case of feature location in an industrial environment. IEEE Access 8 (2020), 65745–65757.
[146]
F. Pérez, T. Ziadi, and C. Cetina. 2022. Utilizing automatic query reformulations as genetic operations to improve feature location in software models. TSE 48, 2 (2022), 713–731.
[147]
D. Qiu, B. Li, S. Ji, and H. Leung. 2014. Regression testing of web service: A systematic mapping study. ACM Comput. Surv. 47, 2 (2014), Article 21, 46 pages.
[148]
M. Raghothaman, Y. Wei, and Y. Hamadi. 2016. SWIM: Synthesizing what I mean: Code search and idiomatic snippet synthesis. In Proceedings of ICSE. 357–367.
[149]
M. M. Rahman, J. Barson, S. Paul, J. Kayani, F. A. Lois, S. F. Quezada, C. Parnin, K T. Stolee, and Baishakhi Ray. 2018. Evaluating how developers use general-purpose web-search for code retrieval. In Proceedings of MSR. 465–475.
[150]
M. M. Rahman, F. Khomh, and M. Castelluccio. 2020. Why are some bugs non-reproducible? An empirical investigation using data fusion. In Proceedings of ICSME. 12.
[151]
M. M. Rahman, F. Khomh, S. Yeasmin, and C. K. Roy. 2021. The forgotten role of search queries in IR-based bug localization: An empirical study. EMSE 26 (2021), 116.
[152]
M. M. Rahman and C. K. Roy. 2014. On the use of context in recommending exception handling code examples. In Proceedings of SCAM. 285–294.
[153]
M. M. Rahman and C. K. Roy. 2015. TextRank based search term identification for software change tasks. In Proceedings of SANER. 540–544.
[154]
M. M. Rahman and C. K. Roy. 2016. QUICKAR: Automatic query reformulation for concept location using crowdsourced knowledge. In Proceedings of ASE. 220–225.
[155]
M. M. Rahman and C. K. Roy. 2017. Improved query reformulation for concept location using CodeRank and document structures. In Proceedings of ASE. 428–439.
[156]
M. M. Rahman and C. K. Roy. 2017. STRICT: Information retrieval based search term identification for concept location. In Proceedings of SANER. 79–90.
[157]
M. M. Rahman and C. K. Roy. 2018. Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In Proceedings of ICSME. 516–527.
[158]
M. M. Rahman and C. K. Roy. 2018. Improving IR-based bug localization with context-aware query reformulation. In Proceedings of ESEC/FSE. 621–632.
[159]
M. M. Rahman, C. K. Roy, and D. Lo. 2016. RACK: Automatic API recommendation using crowdsourced knowledge. In Proceedings of SANER. 349–359.
[160]
M. M. Rahman, C. K. Roy, and D. Lo. 2018. Automatic query reformulation for code search using crowdsourced knowledge. EMSE 24 (2018), 1869–1924.
[161]
Peter C. Rigby, Daniel M. German, Laura Cowen, and Margaret-Anne Storey. 2014. Peer review on open-source software projects: Parameters, statistical models, and theory. TOSEM 23, 4 (2014), 1–33.
[162]
S. E. Robertson. 1991. On term selection for query expansion. J. Doc. 46, 4 (1991), 359–364.
[163]
J. J. Rocchio. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System—Experiments in Automatic Document Processing, G. Salton (Ed.). Prentice Hall, Upper Saddle River, NJ, 313–323.
[164]
M. Roldan-Vega, G. Mallet, E. Hill, and J. A. Fails. 2013. CONQUER: A tool for NL-based query refinement and contextualizing code search results. In Proceedings of ICSM. 512–515.
[165]
Julia Rubin and Marsha Chechik. 2013. A survey of feature location techniques. In Domain Engineering, Product Lines, Languages, and Conceptual Models, Iris Reinhartz-Berger, Arnon Sturm, Tony Clark, Sholom Cohen, and Jorn Bettin (Eds.). Springer, 29–58.
[166]
C. Sadowski, K. T. Stolee, and S. Elbaum. 2015. How developers search for code: A case study. In Proceedings of ESEC/FSE. 191–201.
[167]
R. K. Saha, J. Lawall, S. Khurshid, and D. E. Perry. 2014. On the effectiveness of information retrieval based bug localization for C programs. In Proceedings of ICSME. 161–170.
[168]
R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry. 2013. Improving bug localization using structured information retrieval. In Proceedings of ASE. 345–355.
[169]
G. Salton and C. Buckley. 1997. Improving retrieval performance by relevance feedback. In Readings in Information Retrieval, Karen Sparck Jones and Peter Willet (Eds.). Morgan Kaufmann, San Francisco, CA, 355–364.
[170]
G. Salton and M. J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill Computer Science Series. McGraw-Hill, New York, NY.
[171]
G. Salton, A. Wong, and C. S. Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613–620.
[172]
A. Satter and K. Sakib. 2016. A search log mining based query expansion technique to improve effectiveness in code search. In Proceedings of ICCIT. 586–591.
[173]
T. Savage, M. Revelle, and D. Poshyvanyk. 2010. FLAT3: Feature location and textual tracing tool. In Proceedings of ICSE. 255–258.
[174]
G. Scanniello, A. Marcus, and D. Pascale. 2015. Link analysis algorithms for static concept location: An empirical assessment. EMSE 20, 6 (2015), 1666–1720.
[175]
H. A. Shafiq and Z. Arshad. 2014. Automated Debugging and Bug Fixing Solutions: A Systematic Literature Review and Classification. Master’s thesis. Blekinge Institute of Technology.
[176]
D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker. 2007. Using natural language program analysis to locate and understand action-oriented concerns. In Proceedings of ASOD. 212–224.
[177]
David J. Sheskin. 2007. Handbook of Parametric and Nonparametric Statistical Procedures (4th ed.). Chapman & Hall/CRC, Boca Raton, FL.
[178]
Z. Shi, J. Keung, and Q. Song. 2014. An empirical study of BM25 and BM25F based feature location techniques. In Proceedings of InnoSWDev. 106–114.
[179]
A. Shtok, O. Kurland, D. Carmel, F. Raiber, and G. Markovits. 2012. Predicting query performance by query-drift estimation. TOIS 30, 2 (2012), Article 11, 35 pages.
[180]
R. Sirres, T. F. Bissyandé, D. Kim, D. Lo, J. Klein, K. Kim, and Y. L. Traon. 2018. Augmenting and structuring user queries to support efficient free-form code search. EMSE 23 (2018), 2622–2654.
[181]
Bunyamin Sisman, Shayan A. Akbar, and Avinash C. Kak. 2017. Exploiting spatial code proximity and order for improved source code retrieval for bug localization. JSEP 29, 1 (2017), e1805.
[182]
B. Sisman and A. C. Kak. 2012. Incorporating version histories in information retrieval based bug localization. In Proceedings of MSR. 50–59.
[183]
B. Sisman and A. C. Kak. 2013. Assisting code search with automatic query reformulation for bug localization. In Proceedings of MSR. 309–318.
[184]
G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker. 2008. Identifying word relations in software: A comparative study of semantic similarity tools. In Proceedings of ICPC. 123–132.
[185]
K. Stol, P. Ralph, and B. Fitzgerald. 2016. Grounded theory in software engineering research: A critical review and guidelines. In Proceedings of ICSE. 120–131.
[186]
K. T. Stolee, S. Elbaum, and D. Dobos. 2014. Solving the search for source code. TOSEM 23, 3 (2014), Article 26, 45 pages.
[187]
F. Thung, D. Lo, and J. Lawall. 2013. Automated library recommendation. In Proceedings of WCRE. 182–191.
[188]
F. Thung, S. Wang, D. Lo, and J. Lawall. 2013. Automatic recommendation of API methods from feature requests. In Proceedings of ASE. 290–300.
[189]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR 9, 11 (2008), 1–27.
[190]
C. Vassallo, S. Panichella, M. Di Penta, and G. Canfora. 2014. CODES: Mining source code descriptions from developers discussions. In Proceedings of ICPC. 106–109.
[191]
V. Vinayakarao, A. Sarma, R. Purandare, S. Jain, and S. Jain. 2017. ANNE: Improving source code search using entity retrieval approach. In Proceedings of WSDM. 211–220.
[192]
J. Wang, X. Peng, Z. Xing, and W. Zhao. 2013. Improving feature location practice with multi-faceted interactive exploration. In Proceedings of ICSE. 762–771.
[193]
Q. Wang, C. Parnin, and A. Orso. 2015. Evaluating the usefulness of IR-based fault localization techniques. In Proceedings of ISSTA. 1–11.
[194]
S. Wang and D. Lo. 2014. Version history, similar report, and structure: Putting them together for improved bug localization. In Proceedings of ICPC. 53–63.
[195]
S. Wang and D. Lo. 2016. AmaLgam+: Composing rich information sources for accurate bug localization. JSEP 28, 10 (2016), 921–942.
[196]
S. Wang, D. Lo, and L. Jiang. 2014. Active code search: Incorporating user feedback to improve code search relevance. In Proceedings of ASE. 677–682.
[197]
S. Wang, D. Lo, and L. Jiang. 2016. AutoQuery: Automatic construction of dependency queries for code search. ASE 23, 3 (Sept.2016), 393–425.
[198]
T. Wei, Y. Lu, H. Chang, Q. Zhou, and X. Bao. 2015. A semantic approach for text clustering using WordNet and lexical chains. Expert Syst. Appl. 42, 4 (2015), 2264–2275.
[199]
M. Wen, R. Wu, and S. C. Cheung. 2016. Locus: Locating bugs from software changes. In Proceedings of ASE. 262–273.
[200]
L. A. Wilson. 2010. Using ontology fragments in concept location. In Proceedings of ICSM. 1–2.
[201]
C. Wohlin, P. Runeson, M. Hst, M. C. Ohlsson, B. Regnell, and A. Wessln. 2012. Experimentation in Software Engineering. Springer.
[202]
C. P. Wong, Y. Xiong, H. Zhang, D. Hao, L. Zhang, and H. Mei. 2014. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In Proceedings of ICSME. 181–190.
[203]
W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa. 2016. A survey on software fault localization. TSE 42, 8 (2016), 707–740.
[204]
H. Wu and Y. Yang. 2019. Code search based on alteration intent. IEEE Access 7 (2019), 56796–56802.
[205]
M. Wursch, G. Ghezzi, G. Reif, and H. C. Gall. 2010. Supporting developers with natural language queries. In Proceedings of ICSE. 165–174.
[206]
J. Yang and L. Tan. 2012. Inferring semantically related words from software context. In Proceedings of MSR. 161–170.
[207]
J. Yang and L. Tan. 2014. SWordNet: Inferring semantically related words from software context. EMSE 19, 6 (2014), 1856–1886.
[208]
X. Ye, H. Shen, X. Ma, R. Bunescu, and C. Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of ICSE. 404–415.
[209]
K. C. Youm, J. Ahn, J. Kim, and E. Lee. 2015. Bug localization based on code change histories and bug reports. In Proceedings of APSEC. 190–197.
[210]
H. Yu, W. Song, and T. Mine. 2016. APIBook: An effective approach for finding APIs. In Proceedings of Internetware. 45–53.
[211]
T. Yuan, D. Lo, and J. Lawall. 2014. Automated construction of a software-specific word similarity database. In Proceedings of CSMR-WCRE. 44–53.
[212]
S. Zamani, S. Peck Lee, R. Shokripour, and J. Anvik. 2014. A noun-based approach to feature location using time-aware term-weighting. Inf. Softw. Technol. 56, 8 (2014), 991–1011.
[213]
F. Zhang, H. Niu, I. Keivanloo, and Y. Zou. 2018. Expanding queries for code search using semantically related API class-names. TSE 44, 11 (2018), 1070–1082.
[214]
Jie Zhang, XiaoYin Wang, Dan Hao, Bing Xie, Lu Zhang, and Hong Mei. 2015. A survey on bug-report analysis. SCIS 58, 2 (2015), 1–24.
[215]
W. Zhang, Z. Li, Q. Wang, and J. Li. 2019. FineLocator: A novel approach to method-level fine-grained bug localization by query expansion. IST 110 (2019), 121–135.
[216]
Y. Zhang, D. Lo, X. Xia, G. Scanniello, T. B. Le, and J. Sun. 2018. Fusing multi-abstraction vector space models for concern localization. EMSE 23, 4 (2018), 2279–2322.
[217]
J. Zhou, H. Zhang, and D. Lo. 2012. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In Proceedings of ICSE. 14–24.
[218]
W. Zou, D. Lo, Z. Chen, X. Xia, Y. Feng, and B. Xu. 2020. How practitioners perceive automated bug report management techniques. TSE 46, 8 (2020), 836–862.

Cited By

View all
  • (2023)Feature Location Using Extraction of Code DocumentationProceedings of the 8th International Conference on Sustainable Information Engineering and Technology10.1145/3626641.3627149(481-488)Online publication date: 24-Oct-2023
  • (2023)Automated Question Title Reformulation by Mining Modification Logs From Stack OverflowIEEE Transactions on Software Engineering10.1109/TSE.2023.329239949:9(4390-4410)Online publication date: 1-Sep-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 6
November 2023
949 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3625557
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2023
Online AM: 04 July 2023
Accepted: 23 May 2023
Revised: 09 October 2022
Received: 14 September 2021
Published in TOSEM Volume 32, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Concept location
  2. bug localization
  3. Internet-scale code search
  4. automated query reformulation
  5. term weighting
  6. query quality analysis
  7. machine learning
  8. systematic literature review

Qualifiers

  • Survey

Funding Sources

  • Dalhousie University
  • International Dean’s Scholarship from University of Saskatchewan (2014–2017)
  • Saskatchewan Innovation & Opportunity Scholarship (2017–2018)
  • Natural Sciences and Engineering Research Council of Canada (NSERC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)340
  • Downloads (Last 6 weeks)27
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Feature Location Using Extraction of Code DocumentationProceedings of the 8th International Conference on Sustainable Information Engineering and Technology10.1145/3626641.3627149(481-488)Online publication date: 24-Oct-2023
  • (2023)Automated Question Title Reformulation by Mining Modification Logs From Stack OverflowIEEE Transactions on Software Engineering10.1109/TSE.2023.329239949:9(4390-4410)Online publication date: 1-Sep-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media