article

Cumulated gain-based evaluation of IR techniques

Authors:

Kalervo Järvelin,

Jaana KekäläinenAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 20, Issue 4

Pages 422 - 446

https://doi.org/10.1145/582415.582418

Published: 01 October 2002 Publication History

Abstract

Modern large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation. In order to develop IR techniques in this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, that is, recall and precision based on binary relevance judgments, to graded relevance judgments. Alternatively, novel measures based on graded relevance judgments may be developed. This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor to the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-to-the-ideal performance of IR techniques, based on the cumulative gain they are able to yield. These novel measures are defined and discussed and their use is demonstrated in a case study using TREC data: sample system run results for 20 queries in TREC-7. As a relevance base we used novel graded relevance judgments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, for example, from the user point of view.

References

[1]

Blair, D. C. and Maron, M. E. 1985. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28, 3, 289--299.

[2]

Borlund, P. 2000. Evaluation of interactive information retrieval systems. PhD Dissertation. Åbo University Press.

[3]

Borlund, P. and Ingwersen, P. 1998. Measures of relative relevance and ranked half-life: Performance indicators for interactive IR. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds., ACM, New York, 324--331.

[4]

Conover, W. J. 1980. Practical Nonparametric Statistics, 2nd ed., Wiley, New York.

[5]

Cooper, W. S. 1968. Expected search length: A single measure of retrieval effectiveness based on weak ordering action of retrieval systems. J. Am. Soc. Inf. Sci. 19, 1, 30--41.

[6]

Hersh, W. R. and Hickam, D. H. 1995. An evaluation of interactive Boolean and natural language searching with an online medical textbook. J. Am. Soc. Inf. Sci. 46, 7, 478--489.

[7]

Hull, D. 1993. Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the Sixteenth International Conference on Research and Development in Information Retrieval, R. Korfhage, E. M. Rasmussen, and P. Willett, Eds., ACM, New York, 349--338.

[8]

Järvelin, K. and Kekäläinen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, N. Belkin, P. Ingwersen, AND M.-K. Leong, Eds., ACM, New York, 41--48.

[9]

Kekäläinen, J. and Järvelin, K. 1998. The impact of query structure and query expansion on retrieval performance. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. Van Rijsbergen, R. Wilkinson, AND J. Zobel, Eds., ACM, New York, 130--137.

[10]

Kekäläinen, J. and Järvelin, K. 2000. The co-effects of query structure and expansion on retrieval performance in probabilistic text retrieval. Inf. Retrieval 1, 4, 329--344.

[11]

Kekäläinen, J. and Järvelin, K. 2002a. Using graded relevance assessments in IR evaluation. J. Am. Soc. Inf. Sci. Technol. 53 (to appear).

[12]

Kekäläinen, J. and Järvelin, K. 2002b. Evaluating information retrieval systems under the challenges of interaction and multidimensional dynamic relevance. In Proceedings of the CoLIS 4 Conference, H. Bruce, R. Fidel, P. Ingwersen, AND P. Vakkari, Eds., Libraries Unlimited: Greenwood Village, Colo., 253--270.

[13]

Korfhage, R. R. 1997. Information Storage and Retrieval. Wiley, New York.

[14]

Losee, R. M. 1998. Text Retrieval and Filtering: Analytic Models of Performance. Kluwer Academic, Boston.

[15]

Myaeng, S. H. and Korfhage, R. R. 1990. Integration of user profiles: Models and experiments in information retrieval. Inf. Process. Manage. 26, 6, 719--738.

[16]

Pollack, S. M. 1968. Measures for the comparison of information retrieval systems. Am. Doc. 19, 4, 387--397.

[17]

Over, P. 1999. TREC-7 interactive track report {On-line}. Available at http://trec.nist.gov/pubs/trec7/papers/t7irep.pdf.gz. In NIST Special Publication 500-242: The Seventh Text REtrieval Conference (TREC 7).

[18]

Robertson, S. E. and Belkin, N. J. 1978. Ranking in principle. J. Doc. 34, 2, 93--100.

[19]

Rocchio, J. J., Jr. 1966. Document retrieval systems---Optimization and evaluation. PhD Dissertation. Harvard Computation Laboratory, Harvard University.

[20]

Sakai, T. and Sparck-Jones, K. 2001. Generic summaries for indexing in information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, Eds., ACM, New York, 190--198.

[21]

Salton, G. and Mcgill, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, London.

[22]

Saracevic, T. Kantor, P. Chamis, A., and Trivison, D. 1988. A study of information seeking and retrieving. I. Background and methodology. J. Am. Soc. Inf. Sci. 39, 3, 161--176.

[23]

Sormunen, E. 2000. A method for measuring wide range performance of Boolean queries in full-text databases {On-line}. Available at http://acta.uta.fi/pdf/951-44-4732-8.pdf. PhD Dissertation. Department of Information Studies, University of Tampere.

[24]

Sormunen, E. 2001. Extensions to the STAIRS study---Empirical evidence for the hypothesised ineffectiveness of Boolean queries in large full-text databases. Inf. Retrieval 4, 3/4, 257--273.

[25]

Sormunen, E. 2002. Liberal relevance criteria of TREC---Counting on negligible documents? In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, M. Beaulieu, R. Baeza-Yates, S. H. Myaeng, and K. Järvelin, Eds., ACM, New York, 324--330.

[26]

Sparck-Jones, K. 1974. Automatic indexing. J. Doc. 30, 393--432.

[27]

Spink, A., Geisdorf, H., and Bateman, J. 1998. From highly relevant to non relevant: Examining different regions of relevance. Inf. Process. Manage. 34, 5, 599--622.

[28]

Tang, R., Shaw, W. M., and Vevea, J. L. 1999. Towards the identification of the optimal number of relevance categories. J. Am. Soc. Inf. Sci. 50, 3, 254--264.

[29]

Trec Homepage 2001. Data---English relevance judgements {On-line}. Available at http://trec.nist.gov/data/reljudge_eng.html.

[30]

Vakkari, P. and Hakala, N. 2000. Changes in relevance criteria and problem stages in task performance. J. Doc. 56, 540--562.

[31]

Voorhees, E. 2001. Evaluation by highly relevant documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, D. J. Harper, D. H. Kraft, AND J. Zobel, Eds., ACM, New York, 74--82.

[32]

Voorhees, E. and Harman, D. 1999. Overview of the Seventh Text REtrieval Conference (TREC-7) {On-line}. Available at http://trec.nist.gov/pubs/trec7/papers/overview7.pdf.gz. In NIST Special Publication 500-242: The Seventh Text REtrieval Conference (TREC 7).

[33]

Zobel, J. 1998. How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. Van Rijsbergen, R. Wilkinson, AND J. Zobel, Eds., ACM, New York, 307--314.

Cited By

Selwon KSzymański J(2024)Enhancing Personalized Travel Recommendations: Integrating User Behavior and Content AnalysisProceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.49Online publication date: 2024
https://doi.org/10.62036/ISD.2024.49
Sharifian HTohidi NDadkhah C(2024)Designing a Semi-Intelligent Crawler for Creating a Persian Question Answering Corpus Called PopfaJournal of Information Systems and Telecommunication (JIST)10.61186/jist.40961.12.46.13812:46(138-151)Online publication date: 24-Jun-2024
https://doi.org/10.61186/jist.40961.12.46.138
Zhang HZeng XGanchev I(2024)WeightedSLIM: A Novel Item-Weights Enriched Baseline Recommendation ModelWSEAS TRANSACTIONS ON COMPUTER RESEARCH10.37394/232018.2024.12.2012(201-210)Online publication date: 14-Feb-2024
https://doi.org/10.37394/232018.2024.12.20
Show More Cited By

Index Terms

Cumulated gain-based evaluation of IR techniques
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment

Recommendations

IR evaluation methods for retrieving highly relevant documents
SIGIR Test-of-Time Awardees 1978-2001

This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable ...
IR evaluation methods for retrieving highly relevant documents
SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval

This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable ...
eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval

We propose and evaluate a family of measures, the eXtended Cumulated Gain (XCG) measures, for the evaluation of content-oriented XML retrieval approaches. Our aim is to provide an evaluation framework that allows the consideration of dependency among ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 20, Issue 4

October 2002

90 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/582415

Issue’s Table of Contents

Copyright © 2002 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2002

Published in TOIS Volume 20, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3,201
Total Citations
View Citations
9,617
Total Downloads

Downloads (Last 12 months)530
Downloads (Last 6 weeks)66

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Selwon KSzymański J(2024)Enhancing Personalized Travel Recommendations: Integrating User Behavior and Content AnalysisProceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.49Online publication date: 2024
https://doi.org/10.62036/ISD.2024.49
Sharifian HTohidi NDadkhah C(2024)Designing a Semi-Intelligent Crawler for Creating a Persian Question Answering Corpus Called PopfaJournal of Information Systems and Telecommunication (JIST)10.61186/jist.40961.12.46.13812:46(138-151)Online publication date: 24-Jun-2024
https://doi.org/10.61186/jist.40961.12.46.138
Zhang HZeng XGanchev I(2024)WeightedSLIM: A Novel Item-Weights Enriched Baseline Recommendation ModelWSEAS TRANSACTIONS ON COMPUTER RESEARCH10.37394/232018.2024.12.2012(201-210)Online publication date: 14-Feb-2024
https://doi.org/10.37394/232018.2024.12.20
Wu YWen ZLiang S(2024)Predicting Question Popularity for Community Question AnsweringElectronics10.3390/electronics1316326013:16(3260)Online publication date: 16-Aug-2024
https://doi.org/10.3390/electronics13163260
Xiao BChen D(2024)Explicitly Exploiting Implicit User and Item Relations in Graph Convolutional Network (GCN) for RecommendationElectronics10.3390/electronics1314281113:14(2811)Online publication date: 17-Jul-2024
https://doi.org/10.3390/electronics13142811
Peng JGong JZhou CZang QFang XYang KYu J(2024)KGCFRec: Improving Collaborative Filtering Recommendation with Knowledge GraphElectronics10.3390/electronics1310192713:10(1927)Online publication date: 15-May-2024
https://doi.org/10.3390/electronics13101927
Blazevic MSina LSecco CSiegel MNazemi K(2024)Real-Time Ideation Analyzer and Information RecommenderElectronics10.3390/electronics1309176113:9(1761)Online publication date: 2-May-2024
https://doi.org/10.3390/electronics13091761
Park MOh J(2024)Enhancing E-Commerce Recommendation Systems with Multiple Item Purchase Data: A Bidirectional Encoder Representations from Transformers-Based ApproachApplied Sciences10.3390/app1416725514:16(7255)Online publication date: 17-Aug-2024
https://doi.org/10.3390/app14167255
Zhou QLiu CDuan YSun KLi YKan HGu ZShu JHu J(2024)GastroBot: a Chinese gastrointestinal disease chatbot based on the retrieval-augmented generationFrontiers in Medicine10.3389/fmed.2024.139255511Online publication date: 22-May-2024
https://doi.org/10.3389/fmed.2024.1392555
Che SMao MLiu H(2024)New Community Cold-Start Recommendation: A Novel Large Language Model-based MethodSSRN Electronic Journal10.2139/ssrn.4828316Online publication date: 2024
https://doi.org/10.2139/ssrn.4828316
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents