TY - GEN
T1 - Decoding network optimization using minimum transition error training
AU - Kubo, Yotaro
AU - Watanabe, Shinji
AU - Nakamura, Atsushi
PY - 2012
Y1 - 2012
N2 - The discriminative optimization of decoding networks is important for minimizing speech recognition error. Recently, several methods have been reported that optimize decoding networks by extending weighted finite state transducer (WFST)-based decoding processes to a linear classification process. In this paper, we model decoding processes by using conditional random fields (CRFs). Since the maximum mutual information (MMI) training technique is straightforwardly applicable for CRF training, several sophisticated training methods proposed as the variants of MMI can be incorporated in our decoding network optimization. This paper adapts the boosted MMI and the differenced MMI methods for decoding network optimization so that state transition errors are minimized in WFST decoding. We evaluated the proposed methods by conducting large-vocabulary continuous speech recognition experiments. We confirmed that the CRF-based framework and transition error minimization are efficient for improving the accuracy of automatic speech recognizers.
AB - The discriminative optimization of decoding networks is important for minimizing speech recognition error. Recently, several methods have been reported that optimize decoding networks by extending weighted finite state transducer (WFST)-based decoding processes to a linear classification process. In this paper, we model decoding processes by using conditional random fields (CRFs). Since the maximum mutual information (MMI) training technique is straightforwardly applicable for CRF training, several sophisticated training methods proposed as the variants of MMI can be incorporated in our decoding network optimization. This paper adapts the boosted MMI and the differenced MMI methods for decoding network optimization so that state transition errors are minimized in WFST decoding. We evaluated the proposed methods by conducting large-vocabulary continuous speech recognition experiments. We confirmed that the CRF-based framework and transition error minimization are efficient for improving the accuracy of automatic speech recognizers.
KW - Automatic speech recognition
KW - conditional random fields
KW - transition errors
KW - weighed finite-state transducers
UR - http://www.scopus.com/inward/record.url?scp=84865213310&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865213310&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2012.6288844
DO - 10.1109/ICASSP.2012.6288844
M3 - Conference contribution
AN - SCOPUS:84865213310
SN - 9781467300469
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4197
EP - 4200
BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
T2 - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Y2 - 25 March 2012 through 30 March 2012
ER -