TY - GEN
T1 - Search error risk minimization in viterbi beam search for speech recognition
AU - Hori, Takaaki
AU - Watanabe, Shinji
AU - Nakamura, Atsushi
PY - 2010
Y1 - 2010
N2 - This paper proposes a method to optimize Viterbi beam search based on search error risk minimization in large vocabulary continuous speech recognition (LVCSR). Most speech recognizers employ beam search to speed up the decoding process, in which unpromising partial hypotheses are pruned during decoding. However, the pruning step involves the risk of missing the best complete hypothesis by discarding a partial hypothesis that might grow into the best. Missing the best hypothesis is called search error. Our purpose is to reduce search error by optimizing the pruning step. While conventional methods use heuristic criteria to prune each hypothesis based on its score, rank, and so on, our proposed method introduces a pruning function that makes a more precise decision using the rich features extracted from each hypothesis. The parameters of the function can be estimated efficiently to minimize the search error risk using recognition lattices at the training step. We implemented the new method in a WFST-based decoder and achieved a significant reduction of search errors in a 200K-word LVCSR task.
AB - This paper proposes a method to optimize Viterbi beam search based on search error risk minimization in large vocabulary continuous speech recognition (LVCSR). Most speech recognizers employ beam search to speed up the decoding process, in which unpromising partial hypotheses are pruned during decoding. However, the pruning step involves the risk of missing the best complete hypothesis by discarding a partial hypothesis that might grow into the best. Missing the best hypothesis is called search error. Our purpose is to reduce search error by optimizing the pruning step. While conventional methods use heuristic criteria to prune each hypothesis based on its score, rank, and so on, our proposed method introduces a pruning function that makes a more precise decision using the rich features extracted from each hypothesis. The parameters of the function can be estimated efficiently to minimize the search error risk using recognition lattices at the training step. We implemented the new method in a WFST-based decoder and achieved a significant reduction of search errors in a 200K-word LVCSR task.
KW - Pruning
KW - Search error
KW - Viterbi beam search
KW - WFST
UR - http://www.scopus.com/inward/record.url?scp=78049370808&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78049370808&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2010.5495099
DO - 10.1109/ICASSP.2010.5495099
M3 - Conference contribution
AN - SCOPUS:78049370808
SN - 9781424442966
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4934
EP - 4937
BT - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
Y2 - 14 March 2010 through 19 March 2010
ER -