The reliability of metrics based on graded relevance

Tetsuya Sakai*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper compares 14 metrics designed for information retrieval evaluation with graded relevance, together with 10 traditional metrics based on binary relevance, in terms of reliability and resemblance of system rankings. More specifically, we use two test collections with submitted runs from the Chinese IR and English IR tasks in the NTCIR-3 CLIR track to examine the metrics using methods proposed by Buckley/Voorhees and Voorhees/Buckley as well as Kendall's rank correlation. Our results show that AnDCGl and nDCGl ((Average) Normalised Discounted Cumulative Gain at Document cut-off l) are good metrics, provided that l is large. However, if one wants to avoid the parameter l altogether, or if one requires a metric that closely resembles TREC Average Precision, then Q-measure appears to be the best choice.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages1-16
Number of pages16
Publication statusPublished - 2005 Dec 1
Externally publishedYes
Event2nd Asia Information Retrieval Symposium, AIRS 2005 - Jeju Island, Korea, Republic of
Duration: 2005 Oct 132005 Oct 15

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3689 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd Asia Information Retrieval Symposium, AIRS 2005
Country/TerritoryKorea, Republic of
CityJeju Island
Period05/10/1305/10/15

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'The reliability of metrics based on graded relevance'. Together they form a unique fingerprint.

Cite this