TY - GEN
T1 - CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT
AU - Lu, Yen Ju
AU - Wang, Zhong Qiu
AU - Watanabe, Shinji
AU - Richard, Alexander
AU - Yu, Cheng
AU - Tsao, Yu
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022
Y1 - 2022
N2 - Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes. More specifically, we propose a generalized formulation of the diffusion probabilistic model named conditional diffusion probabilistic model that, in its reverse process, can adapt to non-Gaussian real noises in the estimated speech signal. In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models, and investigate the generalization capability of our models to other datasets with noise characteristics unseen during training.
AB - Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes. More specifically, we propose a generalized formulation of the diffusion probabilistic model named conditional diffusion probabilistic model that, in its reverse process, can adapt to non-Gaussian real noises in the estimated speech signal. In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models, and investigate the generalization capability of our models to other datasets with noise characteristics unseen during training.
KW - deep learning
KW - diffusion probabilistic model
KW - generative model
KW - speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85128656996&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128656996&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9746901
DO - 10.1109/ICASSP43922.2022.9746901
M3 - Conference contribution
AN - SCOPUS:85128656996
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 7402
EP - 7406
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Y2 - 23 May 2022 through 27 May 2022
ER -