TY - GEN
T1 - Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization
AU - Fujimoto, Masakiyo
AU - Watanabe, Shinji
AU - Nakatani, Tomohiro
PY - 2010
Y1 - 2010
N2 - This paper proposes a frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection (VAD). Our previous work, switching Kalman filter-based VAD, sequentially estimates a non-stationary noise Gaussian mixture model (GMM) and constructs GMMs of observed noisy speech signals by composing pre-trained silence and clean GMMs and sequentially estimated noise GMMs. However, the composed models are not optimal, because they do not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed models, we investigate a method for re-estimating the composed model. Since our VAD method works under the frame-wise sequential processing, there are insufficient re-training data for re-estimation of whole model parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable information using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions in expressing the local characteristics of each frame and by normalizing Gaussian weights of remaining distributions.
AB - This paper proposes a frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection (VAD). Our previous work, switching Kalman filter-based VAD, sequentially estimates a non-stationary noise Gaussian mixture model (GMM) and constructs GMMs of observed noisy speech signals by composing pre-trained silence and clean GMMs and sequentially estimated noise GMMs. However, the composed models are not optimal, because they do not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed models, we investigate a method for re-estimating the composed model. Since our VAD method works under the frame-wise sequential processing, there are insufficient re-training data for re-estimation of whole model parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable information using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions in expressing the local characteristics of each frame and by normalizing Gaussian weights of remaining distributions.
KW - Gaussian pruning
KW - Gaussian weight normalization
KW - Switching Kalman filter
KW - Voice activity detection
UR - http://www.scopus.com/inward/record.url?scp=79959857741&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959857741&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:79959857741
T3 - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
SP - 3102
EP - 3105
BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PB - International Speech Communication Association
ER -