TY - GEN
T1 - Bootstrap-based comparisons of IR metrics for finding one relevant document
AU - Sakai, Tetsuya
PY - 2006/1/1
Y1 - 2006/1/1
N2 - This paper compares the sensitivity of IR metrics designed for the task of finding one relevant document, using a method recently proposed at SIGIR 2006. The metrics are: P+-measure, P-measure, O-measure, Normalised Weighted Reciprocal Rank (NWRR) and Reciprocal Rank (RR). All of them except for RR can handle graded relevance. Unlike the ad hoc (but nevertheless useful) "swap" method proposed by Voorhees and Buckley, the new method derives the sensitivity and the performance difference required to guarantee a given significance level directly from Bootstrap Hypothesis Tests. We use four data sets from NTCIR to show that, according to this method, "P( +)-measure ≥ O-measure ≥ NWRR ≥ RR" generally holds, where "≥" means "is at least as sensitive as". These results generalise and reinforce previously reported ones based on the swap method. Therefore, we recommend the use of P(+)-measure and O-measure for practical tasks such as known-item search where recall is either unimportant or immeasurable.
AB - This paper compares the sensitivity of IR metrics designed for the task of finding one relevant document, using a method recently proposed at SIGIR 2006. The metrics are: P+-measure, P-measure, O-measure, Normalised Weighted Reciprocal Rank (NWRR) and Reciprocal Rank (RR). All of them except for RR can handle graded relevance. Unlike the ad hoc (but nevertheless useful) "swap" method proposed by Voorhees and Buckley, the new method derives the sensitivity and the performance difference required to guarantee a given significance level directly from Bootstrap Hypothesis Tests. We use four data sets from NTCIR to show that, according to this method, "P( +)-measure ≥ O-measure ≥ NWRR ≥ RR" generally holds, where "≥" means "is at least as sensitive as". These results generalise and reinforce previously reported ones based on the swap method. Therefore, we recommend the use of P(+)-measure and O-measure for practical tasks such as known-item search where recall is either unimportant or immeasurable.
UR - http://www.scopus.com/inward/record.url?scp=33751354079&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33751354079&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33751354079
SN - 3540457801
SN - 9783540457800
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 374
EP - 389
BT - Information Retrieval Technology - Third Asia Information Retrieval Symposium, AIRS 2006, Proceedings
PB - Springer Verlag
T2 - 3rd Asia Information Retrieval Symposium, AIRS 2006
Y2 - 16 October 2006 through 18 October 2006
ER -