TY - GEN
T1 - Time-frequency-bin-wise Switching of Minimum Variance Distortionless Response Beamformer for Underdetermined Situations
AU - Yamaoka, Kouei
AU - Ono, Nobutaka
AU - Makino, Shoji
AU - Yamada, Takeshi
N1 - Funding Information:
This work was supported by JSPS under Grant 16H01735, and SECOM Science and Technology Foundation.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - In this paper, we present a speech enhancement method using two microphones in underdetermined situations. Time-frequency (TF) binary masking is a conventional method of enhancing speech in underdetermined situations by appropriately multiplying each TF component by zero or one. Extending this method, we previously proposed a new method called the time-frequency-bin-wise switching (TFS) beamformer. In this method, we switch multiple preconstructed beamformers in each TF bin, each of which suppresses a particular interferer. However, this method requires the pre-estimation of beamformer filter coefficients using the target-active period and interferer-wise-active periods as the prior information. In this paper, to overcome this limitation, we formulate the switching and construction of spatial filters as a joint optimization problem, which can be understood from two viewpoints: the clustering of the most dominant interferer signal in each TF bin and the construction of a minimum variance distortionless response beamformer using such bins. In an experiment, we confirmed that the proposed method was superior to conventional TF masking and fixed beamforming during speech enhancement regardless of the direction of interferers.
AB - In this paper, we present a speech enhancement method using two microphones in underdetermined situations. Time-frequency (TF) binary masking is a conventional method of enhancing speech in underdetermined situations by appropriately multiplying each TF component by zero or one. Extending this method, we previously proposed a new method called the time-frequency-bin-wise switching (TFS) beamformer. In this method, we switch multiple preconstructed beamformers in each TF bin, each of which suppresses a particular interferer. However, this method requires the pre-estimation of beamformer filter coefficients using the target-active period and interferer-wise-active periods as the prior information. In this paper, to overcome this limitation, we formulate the switching and construction of spatial filters as a joint optimization problem, which can be understood from two viewpoints: the clustering of the most dominant interferer signal in each TF bin and the construction of a minimum variance distortionless response beamformer using such bins. In an experiment, we confirmed that the proposed method was superior to conventional TF masking and fixed beamforming during speech enhancement regardless of the direction of interferers.
KW - beamforming
KW - nonlinear signal processing
KW - speech enhancement
KW - time-frequency masking
KW - underdetermined situation
UR - http://www.scopus.com/inward/record.url?scp=85068999128&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068999128&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2019.8683528
DO - 10.1109/ICASSP.2019.8683528
M3 - Conference contribution
AN - SCOPUS:85068999128
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 7908
EP - 7912
BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Y2 - 12 May 2019 through 17 May 2019
ER -