TY - JOUR
T1 - Towards automatic evaluation of customer-helpdesk dialogues
AU - Zeng, Zhaohao
AU - Luo, Cheng
AU - Shang, Lifeng
AU - Li, Hang
AU - Sakai, Tetsuya
N1 - Publisher Copyright:
© 2018 Information Processing Society of Japan.
PY - 2018
Y1 - 2018
N2 - We attempt to tackle the problem of evaluating textual, multi-round, task-oriented dialogues between the customer and the helpdesk, such as those that take the form of online chats. As an initial step towards automatic evaluation of helpdesk agent systems, we have constructed a test collection comprising 3,700 real Customer-Helpdesk multi-round dialogues by mining Weibo, a major Chinese microblogging media. Each dialogue has been annotated with multiple subjective quality annotations and nugget annotations, where a nugget is a minimal sequence of posts by the same utterer that helps towards problem solving. In addition, 34% of the dialogues have been manually translated into English. We first propose a nugget-based dialogue quality evaluation measure called Utility for Customer and Helpdesk (UCH), where a nugget is a manually identified utterance within a dialogue that helps the Customer advance towards problem solving. In addition, we propose a simple neural network-based approach to predicting the dialogue quality scores from the entire dialogue, which we call Neural Evaluation Machine (NEM). According to our experiments with DCH-1, UCH correlates better with the appropriateness of utterances than with customer satisfaction. In contrast, as NEM leverages natural language expressions within the dialogue, it correlates relatively well with customer satisfaction.
AB - We attempt to tackle the problem of evaluating textual, multi-round, task-oriented dialogues between the customer and the helpdesk, such as those that take the form of online chats. As an initial step towards automatic evaluation of helpdesk agent systems, we have constructed a test collection comprising 3,700 real Customer-Helpdesk multi-round dialogues by mining Weibo, a major Chinese microblogging media. Each dialogue has been annotated with multiple subjective quality annotations and nugget annotations, where a nugget is a minimal sequence of posts by the same utterer that helps towards problem solving. In addition, 34% of the dialogues have been manually translated into English. We first propose a nugget-based dialogue quality evaluation measure called Utility for Customer and Helpdesk (UCH), where a nugget is a manually identified utterance within a dialogue that helps the Customer advance towards problem solving. In addition, we propose a simple neural network-based approach to predicting the dialogue quality scores from the entire dialogue, which we call Neural Evaluation Machine (NEM). According to our experiments with DCH-1, UCH correlates better with the appropriateness of utterances than with customer satisfaction. In contrast, as NEM leverages natural language expressions within the dialogue, it correlates relatively well with customer satisfaction.
KW - Dialogue
KW - Evaluation
KW - Helpdesk
KW - Neural network
KW - Nugget
UR - http://www.scopus.com/inward/record.url?scp=85063871962&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063871962&partnerID=8YFLogxK
U2 - 10.2197/ipsjjip.26.768
DO - 10.2197/ipsjjip.26.768
M3 - Article
AN - SCOPUS:85063871962
SN - 0387-5806
VL - 26
SP - 768
EP - 778
JO - Journal of information processing
JF - Journal of information processing
ER -