TY - GEN
T1 - On the Instability of Diminishing Return IR Measures
AU - Sakai, Tetsuya
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - The diminishing return property of ERR (Expected Reciprocal Rank) is highly intuitive and attractive: its user model says, for example, that after the users have found a highly relevant document at rank r, few of them will continue to examine rank (r+ 1 ) and beyond. Recently, another IR evaluation measure based on diminishing return called iRBU (intentwise Rank-Biased Utility) was proposed, and it was reported that nDCG (normalised Discounted Cumulative Gain) and iRBU align surprisingly well with users’ SERP (Search Engine Result Page) preferences. The present study conducts offline evaluations of diminishing return measures including ERR and iRBU along with other popular measures such as nDCG, using four test collections and the associated runs from recent TREC tracks and NTCIR tasks. Our results show that the diminishing return measures generally underperform other graded relevance measures in terms of system ranking consistency across two disjoint topic sets as well as discriminative power. The results generalise a previous finding on ERR regarding its limited discriminative power, showing that the diminishing return user model hurts the stability of evaluation measures regardless of the utility function part of the measure. Hence, while we do recommend iRBU along with nDCG for evaluating adhoc IR systems from multiple user-oriented angles, iRBU should be used under the awareness that it can be much less statistically stable than nDCG.
AB - The diminishing return property of ERR (Expected Reciprocal Rank) is highly intuitive and attractive: its user model says, for example, that after the users have found a highly relevant document at rank r, few of them will continue to examine rank (r+ 1 ) and beyond. Recently, another IR evaluation measure based on diminishing return called iRBU (intentwise Rank-Biased Utility) was proposed, and it was reported that nDCG (normalised Discounted Cumulative Gain) and iRBU align surprisingly well with users’ SERP (Search Engine Result Page) preferences. The present study conducts offline evaluations of diminishing return measures including ERR and iRBU along with other popular measures such as nDCG, using four test collections and the associated runs from recent TREC tracks and NTCIR tasks. Our results show that the diminishing return measures generally underperform other graded relevance measures in terms of system ranking consistency across two disjoint topic sets as well as discriminative power. The results generalise a previous finding on ERR regarding its limited discriminative power, showing that the diminishing return user model hurts the stability of evaluation measures regardless of the utility function part of the measure. Hence, while we do recommend iRBU along with nDCG for evaluating adhoc IR systems from multiple user-oriented angles, iRBU should be used under the awareness that it can be much less statistically stable than nDCG.
KW - Diminishing return
KW - Discriminative power
KW - Evaluation measures
KW - Statistical significance
KW - System ranking consistency
UR - http://www.scopus.com/inward/record.url?scp=85107336428&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107336428&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-72113-8_38
DO - 10.1007/978-3-030-72113-8_38
M3 - Conference contribution
AN - SCOPUS:85107336428
SN - 9783030721121
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 572
EP - 586
BT - Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Proceedings
A2 - Hiemstra, Djoerd
A2 - Moens, Marie-Francine
A2 - Mothe, Josiane
A2 - Perego, Raffaele
A2 - Potthast, Martin
A2 - Sebastiani, Fabrizio
PB - Springer Science and Business Media Deutschland GmbH
T2 - 43rd European Conference on Information Retrieval Research, ECIR 2021
Y2 - 28 March 2021 through 1 April 2021
ER -