TY - GEN
T1 - Predicting listener back-channels for human-agent interaction using neuro-dynamical model
AU - Sano, Shotaro
AU - Nishide, Shun
AU - Okuno, Hiroshi G.
AU - Ogata, Tetsuya
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - The goal of our work is to create natural verbal interaction between humans and speech dialogue agents. In this paper, we focus on generations of back-channel for speech dialogue agents the same way humans do. To create such a system, the system needs to predict the appropriate timing of back-channel on the basis of the human's speech. For the prediction model, we use a neuro-dynamical system called a multiple timescale recurrent neural network (MTRNN). The model is trained using an actual corpus of a poster session of the IMADE project using the presenter's prosodic and visual information as features. Using the model, we conducted back-channel timing prediction experiments. The results showed that our system could predict back-channel timing about 0.5 seconds before generation of back-channel response. Comparing the results with the actual back-channel timing in the corpus, the system showed 37.1% of recall, 31.7% of precision, and 34.2% of F-measure. These results show the model to effectively predict and generate back-channel responses.
AB - The goal of our work is to create natural verbal interaction between humans and speech dialogue agents. In this paper, we focus on generations of back-channel for speech dialogue agents the same way humans do. To create such a system, the system needs to predict the appropriate timing of back-channel on the basis of the human's speech. For the prediction model, we use a neuro-dynamical system called a multiple timescale recurrent neural network (MTRNN). The model is trained using an actual corpus of a poster session of the IMADE project using the presenter's prosodic and visual information as features. Using the model, we conducted back-channel timing prediction experiments. The results showed that our system could predict back-channel timing about 0.5 seconds before generation of back-channel response. Comparing the results with the actual back-channel timing in the corpus, the system showed 37.1% of recall, 31.7% of precision, and 34.2% of F-measure. These results show the model to effectively predict and generate back-channel responses.
UR - http://www.scopus.com/inward/record.url?scp=84857575200&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84857575200&partnerID=8YFLogxK
U2 - 10.1109/SII.2011.6147412
DO - 10.1109/SII.2011.6147412
M3 - Conference contribution
AN - SCOPUS:84857575200
SN - 9781457715235
T3 - 2011 IEEE/SICE International Symposium on System Integration, SII 2011
SP - 18
EP - 23
BT - 2011 IEEE/SICE International Symposium on System Integration, SII 2011
T2 - 2011 IEEE/SICE International Symposium on System Integration, SII 2011
Y2 - 20 December 2011 through 22 December 2011
ER -