Test collections and measures for evaluating customer-helpdesk dialogues

Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai

Research output: Contribution to journalConference articlepeer-review

3 Citations (Scopus)


We address the problem of evaluating textual task-oriented dialogues between the customer and the helpdesk, such as those that take the form of online chats. As an initial step towards evaluating automatic helpdesk dialogue systems, we have constructed a test collection comprising 3,700 real Customer-Helpdesk multi-turn dialogues by mining Weibo, a major Chinese social media. We have annotated each dialogue with multiple subjective quality annotations and nugget annotations, where a nugget is a minimal sequence of posts by the same utterer that helps towards problem solving. In addition 10% of the dialogues have been manually translated into English. We have made our test collection DCH-1 publicly available for research purposes. We also propose a simple nugget-based evaluation measure for task-oriented dialogue evaluation, which we call UCH, and explore its usefulness and limitations.

Original languageEnglish
Pages (from-to)1-9
Number of pages9
JournalCEUR Workshop Proceedings
Publication statusPublished - 2017
Event8th International Workshop on Evaluating Information Access, EVIA 2017 - Tokyo, Japan
Duration: 2017 Dec 5 → …


  • Dialogues
  • Evaluation
  • Helpdesk
  • Measures
  • Nuggets
  • Test collections

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'Test collections and measures for evaluating customer-helpdesk dialogues'. Together they form a unique fingerprint.

Cite this