TY - GEN
T1 - Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis
AU - Misawa, Sota
AU - Takamune, Norihiro
AU - Nakamura, Tomohiko
AU - Kitamura, Daichi
AU - Saruwatari, Hiroshi
AU - Une, Masakazu
AU - Makino, Shoji
N1 - Funding Information:
This work was supported by the Japan–New Zealand Research Cooperative Program of JSPS and RSNZ (Grant Number JPJSBP120201002), JSPS KAKENHI Grant Numbers 19K20306, 19H01116, and 19H04131, and JST Moonshot R&D Grant Number JPMJPS2011.
Publisher Copyright:
© 2021 APSIPA.
PY - 2021
Y1 - 2021
N2 - Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for the situation that the directional target speech and the diffuse noise are mixed. In conventional RCSCME, independent low-rank matrix analysis (ILRMA) is used as the preprocessing method. We propose RCSCME using independent deeply learned matrix analysis (IDLMA), which is a supervised extension of ILRMA. In this method, IDLMA requires deep neural networks (DNNs) to separate the target speech and the noise. We use Denoiser, which is a single-channel speech enhancement DNN, in IDLMA to estimate not only the target speech but also the noise. We also propose noise self-supervised RCSCME, in which we estimate the noise-only time intervals using the output of Denoiser and design the prior distribution of the noise spatial covariance matrix for RCSCME. We confirm that the proposed methods outperform the conventional methods under several noise conditions.
AB - Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for the situation that the directional target speech and the diffuse noise are mixed. In conventional RCSCME, independent low-rank matrix analysis (ILRMA) is used as the preprocessing method. We propose RCSCME using independent deeply learned matrix analysis (IDLMA), which is a supervised extension of ILRMA. In this method, IDLMA requires deep neural networks (DNNs) to separate the target speech and the noise. We use Denoiser, which is a single-channel speech enhancement DNN, in IDLMA to estimate not only the target speech but also the noise. We also propose noise self-supervised RCSCME, in which we estimate the noise-only time intervals using the output of Denoiser and design the prior distribution of the noise spatial covariance matrix for RCSCME. We confirm that the proposed methods outperform the conventional methods under several noise conditions.
UR - http://www.scopus.com/inward/record.url?scp=85126698567&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126698567&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85126698567
T3 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
SP - 578
EP - 584
BT - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Y2 - 14 December 2021 through 17 December 2021
ER -