TY - GEN
T1 - Speaker normalized acoustic modeling based on 3-D viterbi decoding
AU - Fukada, T.
AU - Sagisaka, Y.
PY - 1998
Y1 - 1998
N2 - This paper describes a novel method for speaker normalization based on a frequency warping approach to reduce variations due to speaker-induced factors such as the vocal tract length. In our approach, a speaker normalized acoustic model is trained using time-varying (i.e., state, phoneme or word dependent) warping factors, while in the conventional approaches, the frequency warping factor is fixed for each speaker. These time-varying frequency warping factors are determined by a 3-dimensional (i.e., input frames, HMM states and warping factors) Viterbi decoding procedure. Experimental results on Japanese spontaneous speech recognition show that the proposed method yields a 9.7% improvement in speech recognition accuracy compared to the conventional speaker-independent model.
AB - This paper describes a novel method for speaker normalization based on a frequency warping approach to reduce variations due to speaker-induced factors such as the vocal tract length. In our approach, a speaker normalized acoustic model is trained using time-varying (i.e., state, phoneme or word dependent) warping factors, while in the conventional approaches, the frequency warping factor is fixed for each speaker. These time-varying frequency warping factors are determined by a 3-dimensional (i.e., input frames, HMM states and warping factors) Viterbi decoding procedure. Experimental results on Japanese spontaneous speech recognition show that the proposed method yields a 9.7% improvement in speech recognition accuracy compared to the conventional speaker-independent model.
UR - http://www.scopus.com/inward/record.url?scp=0031624341&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031624341&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.1998.674461
DO - 10.1109/ICASSP.1998.674461
M3 - Conference contribution
AN - SCOPUS:0031624341
SN - 0780344286
SN - 9780780344280
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 437
EP - 440
BT - Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
T2 - 1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
Y2 - 12 May 1998 through 15 May 1998
ER -