TY - GEN
T1 - Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data
AU - Watanabe, Shinji
AU - Hori, Takaaki
AU - Nakamura, Atsushi
N1 - Funding Information:
We thank the MIT Spoken Language Systems Group for helping us to perform speech recognition experiments based on MIT-OCW. We also thank Dr. Erik McDermott at Google, Inc. for valuable comments on this paper.
PY - 2010
Y1 - 2010
N2 - This paper describes a discriminative approach that further advances the framework for Weighted Finite State Transducer (WFST) based decoding. The approach introduces additional linear models for adjusting the scores of a decoding graph composed of conventional information source models (e.g., hidden Markov models and N-gram models), and reviews the WFST-based decoding process as a linear classifier for structured data (e.g., sequential multiclass data). The difficulty with the approach is that the number of dimensions of the additional linear models becomes very large in proportion to the number of arcs in a WFST, and our previous study only applied it to a small task (TIMIT phoneme recognition). This paper proposes a training method for a large-scale linear classifier employed in WFST-based decoding by using a distributed perceptron algorithm. The experimental results show that the proposed approach was successfully applied to a large vocabulary continuous speech recognition task, and achieved an improvement compared with the performance of the minimum phone error based discriminative training of acoustic models.
AB - This paper describes a discriminative approach that further advances the framework for Weighted Finite State Transducer (WFST) based decoding. The approach introduces additional linear models for adjusting the scores of a decoding graph composed of conventional information source models (e.g., hidden Markov models and N-gram models), and reviews the WFST-based decoding process as a linear classifier for structured data (e.g., sequential multiclass data). The difficulty with the approach is that the number of dimensions of the additional linear models becomes very large in proportion to the number of arcs in a WFST, and our previous study only applied it to a small task (TIMIT phoneme recognition). This paper proposes a training method for a large-scale linear classifier employed in WFST-based decoding by using a distributed perceptron algorithm. The experimental results show that the proposed approach was successfully applied to a large vocabulary continuous speech recognition task, and achieved an improvement compared with the performance of the minimum phone error based discriminative training of acoustic models.
KW - Distributed perceptron
KW - Large vocabulary continuous speech recognition
KW - Linear classifier
KW - Speech recognition
KW - Weighted finite state transducer
UR - http://www.scopus.com/inward/record.url?scp=79959846027&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959846027&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:79959846027
T3 - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
SP - 346
EP - 349
BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PB - International Speech Communication Association
ER -