Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation

Jin Sakuma*, Shinya Fujie, Tetsunori Kobayashi

*この研究の対応する著者

研究成果: Conference article査読

3 被引用数 (Scopus)

抄録

We propose neural networks for predicting response timing of spoken dialog systems. Response timing varies depending on the dialog context. This context-dependent response timing is conventionally estimated directly from acoustic event sequences and word sequences extracted from past utterances. Since there are so wide varieties in these sequences, large amounts of training data are required to build reliable models. While, there is no large dialog databases with response timings annotated. The proposed method estimates dialog act for each utterance as an auxiliary task, and uses its intermediate states for response timing estimation in addition to acoustic and linguistic features. Since dialog act has significantly less variation than word sequences and is closely related to response timing, we expect to be able to construct a highly reliable model even with small training data. We evaluate our approach on the HARPERVALLEYBANK corpus. The experimental results show that the proposed approach is more effective than the conventional approach that does not use dialog act information for each utterance such as dialog act.

本文言語English
ページ(範囲)4486-4490
ページ数5
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2022-September
DOI
出版ステータスPublished - 2022
イベント23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
継続期間: 2022 9月 182022 9月 22

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル