Using graded-relevance metrics for evaluating community QA answer selection

Tetsuya Sakai*, Yohei Seki, Daisuke Ishikawa, Kazuko Kuriyama, Noriko Kando, Chin Yew Lin

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

Community Question Answering (CQA) sites such as Yahoo ! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.

Original languageEnglish
Title of host publicationProceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011
Pages187-196
Number of pages10
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event4th ACM International Conference on Web Search and Data Mining, WSDM 2011 - Hong Kong, China
Duration: 2011 Feb 92011 Feb 12

Publication series

NameProceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011

Conference

Conference4th ACM International Conference on Web Search and Data Mining, WSDM 2011
Country/TerritoryChina
CityHong Kong
Period11/2/911/2/12

Keywords

  • Best answers
  • Community question answering
  • Evaluation
  • Graded relevance
  • NTCIR
  • Test collections

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Using graded-relevance metrics for evaluating community QA answer selection'. Together they form a unique fingerprint.

Cite this