skip to main content
10.5555/2486788.2486898acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Automatic query reformulations for text retrieval in software engineering

Published: 18 May 2013 Publication History

Abstract

There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place.
We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).

References

[1]
A. Marcus and G. Antoniol, “On the use of text retrieval techniques in software engineering,” in Proceedings of 34th IEEE/ACM International Conference on Software Engineering, Technical Briefing, 2012.
[2]
J. H. Hayes, A. Dekhtyar, and S. K. Sundaram, “Advancing candidate link generation for requirements tracing: The study of methods.” IEEE Transactions on Software Engineering, vol. 32, no. 1, pp. 4–19, 2006.
[3]
G. Gay, S. Haiduc, A. Marcus, and T. Menzies, “On the use of relevance feedback in ir-based concept location,” in Proceedings of the International Conference on Software Maintenance, 2009, pp. 351–360.
[4]
M. Gibiec, A. Czauderna, and J. Cleland-Huang, “Towards mining replacement queries for hard-to-retrieve traces,” in Proceedings of the International Conference on Automated Software Engineering, 2010, pp. 245–254.
[5]
A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic, “An information retrieval approach to concept location in source code,” in Proceedings of the Working Conference on Reverse Engineering, 2004, pp. 214–223.
[6]
J. Yang and L. Tan, “Inferring semantically related words from software context,” in Proceedings of 9th Working Conference on Mining Software Repositories, 2012, pp. 161–170.
[7]
D. Carmel and E. Yom-Tov, Estimating the Query Difficulty for Information Retrieval. Morgan and Claypool Publishers, 2010.
[8]
S. Haiduc, G. Bavota, A. De Lucia, A. Marcus, and R. Oliveto, “Evaluating the specificity of text retrieval queries to support software engineering tasks,” in Proceedings of the 34th IEEE/ACM International Conference on Software Engineering, NIER Track, 2012, pp. 1273–1276.
[9]
X. A. Lu and R. B. Keefer, “Query expansion/reduction and its impact on retrieval effectiveness,” NIST SPecial Publication SP, vol. 225, pp. 231–239, 1995.
[10]
G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais, “The vocabulary problem in human-system communication,” Communications of the ACM, vol. 30, no. 11, pp. 964–971, 1987.
[11]
C. Carpineto and G. Romano, “A survey of automatic query expansion in information retrieval,” ACM Computing Surveys, vol. 44, pp. 1–56, 2012.
[12]
G. Sridhara, E. Hill, L. L. Pollock, and K. Vijay-Shanker, “Identifying word relations in software: A comparative study of semantic similarity tools,” in Proceedings of the International Conference on Program Comprehension, 2008, pp. 123–132.
[13]
J. J. Rocchio, The SMART Retrieval System – Experiments in Automatic Document Processing. Prentice Hall, Inc., 1971, ch. Relevance feedback in information retrieval, pp. 313–323.
[14]
S. Haiduc, G. Bavota, R. Oliveto, A. D. Lucia, and A. Marcus, “Automatic query performance assessment during the retrieval of software artifacts,” in IEEE/ACM International Conference on Automated Software Engineering, ASE’12, 2012, pp. 90–99.
[15]
L. Breiman, J. Friedman, C. Stone, and R. A. Olshen, Classification and Regression Trees. Chapman and Hall, 1984.
[16]
B. Dit, M. Revelle, M. Gethers, and D. Poshyvanyk, “Feature location in source code: a taxonomy and survey,” Journal of Software: Evolution and Process, vol. 25, no. 1, pp. 53–95, 2013.
[17]
W. J. Conover, Practical Nonparametric Statistics, 3rd ed. Wiley, 1998.
[18]
S. Holm, “A simple sequentially rejective Bonferroni test procedure,” Scandinavian Journal on Statistics, vol. 6, pp. 65–70, 1979.
[19]
N. Balasubramanian, G. Kumaran, and V. R. Carvalho, “Exploring reductions for long web queries,” in Proceedings of SIGIR, 2010, pp. 571–578.
[20]
X. Xue, S. Huston, and W. B. Croft, “Improving verbose queries using subset distribution,” in Proceedings of the ACM International Conference on Information and Knowledge Management, 2010, pp. 1059–1068.
[21]
M. Petrenko, V. Rajlich, and R. Vanciu, “Partial domain comprehension in software evolution and maintenance,” in Proceedings of the International Conference on Program Comprehension, 2008, pp. 13–22.
[22]
J. Starke, C. Luce, and J. Sillito, “Searching and skimming: An exploratory study,” in Proceedings of the International Conference on Software Maintenance, 2009, pp. 157–166.
[23]
A. De Lucia, R. Oliveto, and P. Sgueglia, “Incremental approach and user feedbacks: a silver bullet for traceability recovery,” in Proceedings of the International Conference on Software Maintenance, 2006, pp. 299–309.
[24]
E. Hill, L. Pollock, and K. Vijay-Shanker, “Automatically capturing source code context of nl-queries for software maintenance and reuse,” in Proceedings of the International Conference on Software Engineering, 2009.
[25]
D. Shepherd, Z. Fry, E. Gibson, L. Pollock, and K. Vijay-Shanker, “Using natural language program analysis to locate and understand actionoriented concerns,” in Proceedings of the International Conference on Aspect Oriented Software Development, 2007, pp. 212–224.

Cited By

View all
  • (2023)Big Code Search: A BibliographyACM Computing Surveys10.1145/360490556:1(1-49)Online publication date: 26-Aug-2023
  • (2023)How the Quality of Maintenance Tasks is Affected by Criteria for Selecting Engineers for CollaborationACM Transactions on Software Engineering and Methodology10.1145/356138432:3(1-22)Online publication date: 26-Apr-2023
  • (2023)deGraphCS: Embedding Variable-based Flow Graph for Neural Code SearchACM Transactions on Software Engineering and Methodology10.1145/354606632:2(1-27)Online publication date: 30-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '13: Proceedings of the 2013 International Conference on Software Engineering
May 2013
1561 pages
ISBN:9781467330763

Sponsors

Publisher

IEEE Press

Publication History

Published: 18 May 2013

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Big Code Search: A BibliographyACM Computing Surveys10.1145/360490556:1(1-49)Online publication date: 26-Aug-2023
  • (2023)How the Quality of Maintenance Tasks is Affected by Criteria for Selecting Engineers for CollaborationACM Transactions on Software Engineering and Methodology10.1145/356138432:3(1-22)Online publication date: 26-Apr-2023
  • (2023)deGraphCS: Embedding Variable-based Flow Graph for Neural Code SearchACM Transactions on Software Engineering and Methodology10.1145/354606632:2(1-27)Online publication date: 30-Mar-2023
  • (2023)Improving Programming Q&A with Neural Generative AugmentationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591860(3390-3394)Online publication date: 19-Jul-2023
  • (2021)CodeMatcher: Searching Code Based on Sequential Semantics of Important Query WordsACM Transactions on Software Engineering and Methodology10.1145/346540331:1(1-37)Online publication date: 28-Sep-2021
  • (2021)Automatic Solution Summarization for Crash BugsProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00117(1286-1297)Online publication date: 22-May-2021
  • (2021)Automated Query Reformulation for Efficient Search based on Query Logs From Stack OverflowProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00116(1273-1285)Online publication date: 22-May-2021
  • (2020)Adaptive Deep Code SearchProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389278(48-59)Online publication date: 13-Jul-2020
  • (2020)Improving Code Search with Co-Attentive Representation LearningProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389269(196-207)Online publication date: 13-Jul-2020
  • (2020)An Empirical Study of Software Exceptions in the Field using Search LogsProceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1145/3382494.3410692(1-12)Online publication date: 5-Oct-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media