TY - GEN
T1 - Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer
AU - Delcroix, Marc
AU - Nakatani, Tomohiro
AU - Watanabe, Shinji
PY - 2008
Y1 - 2008
N2 - It is well known that automatic speech recognition performs poorly in presence of noise or reverberation. Much research has been undertaken on model adaptation and speech enhancement to increase the robustness of speech recognizers. Model adaptation is effective to remove static mismatch between speech features and acoustic model parameters, but may not cope well with dynamic mismatch. Speech enhancement approaches can reduce dynamic perturbations, but often do not interconnect well with speech recognizer. There seems to be a lack of optimal way to combine these two approaches. In this paper we propose introducing the dynamic capabilities of speech enhancement into a static adaptation scheme. We focus on variance adaptation, and propose a novel parametric variance model that includes static and dynamic components. The dynamic component is derived from a speech enhancement pre-process, and the parameters of the model are optimized using an adaptive training scheme. An evaluation of the method with a speech dereverberation for preprocessing revealed that a 80 % relative error rate reduction was possible compared with the recognition of dereverberated speech, and the final error rate was 5.4 % which is close to that of clean speech (1.2%).
AB - It is well known that automatic speech recognition performs poorly in presence of noise or reverberation. Much research has been undertaken on model adaptation and speech enhancement to increase the robustness of speech recognizers. Model adaptation is effective to remove static mismatch between speech features and acoustic model parameters, but may not cope well with dynamic mismatch. Speech enhancement approaches can reduce dynamic perturbations, but often do not interconnect well with speech recognizer. There seems to be a lack of optimal way to combine these two approaches. In this paper we propose introducing the dynamic capabilities of speech enhancement into a static adaptation scheme. We focus on variance adaptation, and propose a novel parametric variance model that includes static and dynamic components. The dynamic component is derived from a speech enhancement pre-process, and the parameters of the model are optimized using an adaptive training scheme. An evaluation of the method with a speech dereverberation for preprocessing revealed that a 80 % relative error rate reduction was possible compared with the recognition of dereverberated speech, and the final error rate was 5.4 % which is close to that of clean speech (1.2%).
KW - Dereverberation
KW - Model adaptation
KW - Robust ASR
KW - Variance compensation
UR - http://www.scopus.com/inward/record.url?scp=51449102822&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51449102822&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2008.4518549
DO - 10.1109/ICASSP.2008.4518549
M3 - Conference contribution
AN - SCOPUS:51449102822
SN - 1424414849
SN - 9781424414840
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4073
EP - 4076
BT - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
T2 - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Y2 - 31 March 2008 through 4 April 2008
ER -