TY - GEN
T1 - On the reliability and intuitiveness of aggregated search metrics
AU - Zhou, Ke
AU - Lalmas, Mounia
AU - Sakai, Tetsuya
AU - Cummins, Ronan
AU - Jose, Joemon M.
PY - 2013
Y1 - 2013
N2 - Aggregating search results from a variety of diverse verticals such as news, images, videos and Wikipedia into a single interface is a popular web search presentation paradigm. Although several aggregated search (AS) metrics have been proposed to evaluate AS result pages, their properties remain poorly understood. In this paper, we compare the properties of existing AS metrics under the assumptions that (1) queries may have multiple preferred verticals; (2) the likelihood of each vertical preference is available; and (3) the topical relevance assessments of results returned from each vertical is available. We compare a wide range of AS metrics on two test collections. Our main criteria of comparison are (1) discriminative power, which represents the reliability of a metric in comparing the performance of systems, and (2) intuitiveness, which represents how well a metric captures the various key aspects to be measured (i.e. various aspects of a user's perception of AS result pages). Our study shows that the AS metrics that capture key AS components (e.g., vertical selection) have several advantages over other metrics. This work sheds new lights on the further developments and applications of AS metrics.
AB - Aggregating search results from a variety of diverse verticals such as news, images, videos and Wikipedia into a single interface is a popular web search presentation paradigm. Although several aggregated search (AS) metrics have been proposed to evaluate AS result pages, their properties remain poorly understood. In this paper, we compare the properties of existing AS metrics under the assumptions that (1) queries may have multiple preferred verticals; (2) the likelihood of each vertical preference is available; and (3) the topical relevance assessments of results returned from each vertical is available. We compare a wide range of AS metrics on two test collections. Our main criteria of comparison are (1) discriminative power, which represents the reliability of a metric in comparing the performance of systems, and (2) intuitiveness, which represents how well a metric captures the various key aspects to be measured (i.e. various aspects of a user's perception of AS result pages). Our study shows that the AS metrics that capture key AS components (e.g., vertical selection) have several advantages over other metrics. This work sheds new lights on the further developments and applications of AS metrics.
KW - Aggregated search
KW - Discriminative power
KW - Diversity
KW - Evaluation
KW - Intuitiveness
KW - Metric
KW - Reliability
UR - http://www.scopus.com/inward/record.url?scp=84889588155&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889588155&partnerID=8YFLogxK
U2 - 10.1145/2505515.2505691
DO - 10.1145/2505515.2505691
M3 - Conference contribution
AN - SCOPUS:84889588155
SN - 9781450322638
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 689
EP - 698
BT - CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
T2 - 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
Y2 - 27 October 2013 through 1 November 2013
ER -