TY - JOUR
T1 - Confusion Detection for Adaptive Conversational Strategies of An Oral Proficiency Assessment Interview Agent
AU - Saeki, Mao
AU - Miyagi, Kotoka
AU - Fujie, Shinya
AU - Suzuki, Shungo
AU - Ogawa, Tetsuji
AU - Kobayashi, Tetsunori
AU - Matsuyama, Yoichi
N1 - Funding Information:
This paper is based on results obtained from a project, JPNP20006 (”Online Language Learning AI Assistant that Grows with People”), subsidized by the New Energy and Industrial Technology Development Organization (NEDO).
Publisher Copyright:
Copyright © 2022 ISCA.
PY - 2022
Y1 - 2022
N2 - In this study, we present a model to detect user confusion in an online interview dialogue using conversational agents. Conversational agents have gained attention for reliable assessment of language learners' oral skills in interviews. Learners often face confusion, where they fail to understand what the system has said, and may end up unable to respond, leading to a conversational breakdown. It is thus crucial for the system to detect such a state and keep the interview going forward by repeating or rephrasing the previous system utterance. To this end, we first collected a dataset of user confusion using a psycholinguistic experimental approach and identified seven multimodal signs of confusion, some of which were unique to an online conversation. With the corresponding features, we trained a classification model of user confusion. An ablation study showed that the features related to self-talk and gaze direction were most predictive. We discuss how this model can assist a conversational agent to detect and resolve user confusion in real-time.
AB - In this study, we present a model to detect user confusion in an online interview dialogue using conversational agents. Conversational agents have gained attention for reliable assessment of language learners' oral skills in interviews. Learners often face confusion, where they fail to understand what the system has said, and may end up unable to respond, leading to a conversational breakdown. It is thus crucial for the system to detect such a state and keep the interview going forward by repeating or rephrasing the previous system utterance. To this end, we first collected a dataset of user confusion using a psycholinguistic experimental approach and identified seven multimodal signs of confusion, some of which were unique to an online conversation. With the corresponding features, we trained a classification model of user confusion. An ablation study showed that the features related to self-talk and gaze direction were most predictive. We discuss how this model can assist a conversational agent to detect and resolve user confusion in real-time.
KW - computational paralinguistics
KW - confusion detection
KW - conversational agents
KW - oral proficiency interview
UR - http://www.scopus.com/inward/record.url?scp=85140080236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140080236&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2022-10075
DO - 10.21437/Interspeech.2022-10075
M3 - Conference article
AN - SCOPUS:85140080236
SN - 2308-457X
VL - 2022-September
SP - 3988
EP - 3992
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
Y2 - 18 September 2022 through 22 September 2022
ER -