TY - JOUR
T1 - Test collections and measures for evaluating customer-helpdesk dialogues
AU - Zeng, Zhaohao
AU - Luo, Cheng
AU - Shang, Lifeng
AU - Li, Hang
AU - Sakai, Tetsuya
N1 - Publisher Copyright:
© 2017 Copyright held by the author.
PY - 2017
Y1 - 2017
N2 - We address the problem of evaluating textual task-oriented dialogues between the customer and the helpdesk, such as those that take the form of online chats. As an initial step towards evaluating automatic helpdesk dialogue systems, we have constructed a test collection comprising 3,700 real Customer-Helpdesk multi-turn dialogues by mining Weibo, a major Chinese social media. We have annotated each dialogue with multiple subjective quality annotations and nugget annotations, where a nugget is a minimal sequence of posts by the same utterer that helps towards problem solving. In addition 10% of the dialogues have been manually translated into English. We have made our test collection DCH-1 publicly available for research purposes. We also propose a simple nugget-based evaluation measure for task-oriented dialogue evaluation, which we call UCH, and explore its usefulness and limitations.
AB - We address the problem of evaluating textual task-oriented dialogues between the customer and the helpdesk, such as those that take the form of online chats. As an initial step towards evaluating automatic helpdesk dialogue systems, we have constructed a test collection comprising 3,700 real Customer-Helpdesk multi-turn dialogues by mining Weibo, a major Chinese social media. We have annotated each dialogue with multiple subjective quality annotations and nugget annotations, where a nugget is a minimal sequence of posts by the same utterer that helps towards problem solving. In addition 10% of the dialogues have been manually translated into English. We have made our test collection DCH-1 publicly available for research purposes. We also propose a simple nugget-based evaluation measure for task-oriented dialogue evaluation, which we call UCH, and explore its usefulness and limitations.
KW - Dialogues
KW - Evaluation
KW - Helpdesk
KW - Measures
KW - Nuggets
KW - Test collections
UR - http://www.scopus.com/inward/record.url?scp=85038867537&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85038867537&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85038867537
SN - 1613-0073
VL - 2008
SP - 1
EP - 9
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 8th International Workshop on Evaluating Information Access, EVIA 2017
Y2 - 5 December 2017
ER -