TY - GEN
T1 - Ensemble learning for speech enhancement
AU - Le Roux, Jonathan
AU - Watanabe, Shinji
AU - Hershey, John R.
PY - 2013
Y1 - 2013
N2 - Over the years, countless algorithms have been proposed to solve the problem of speech enhancement from a noisy mixture. Many have succeeded in improving at least parts of the signal, while often deteriorating others. Based on the assumption that different algorithms are likely to enjoy different qualities and suffer from different flaws, we investigate the possibility of combining the strengths of multiple speech enhancement algorithms, formulating the problem in an ensemble learning framework. As a first example of such a system, we consider the prediction of a time-frequency mask obtained from the clean speech, based on the outputs of various algorithms applied on the noisy mixture. We consider several approaches involving various notions of context and various machine learning algorithms for classification, in the case of binary masks, and regression, in the case of continuous masks. We show that combining several algorithms in this way can lead to an improvement in enhancement performance, while simple averaging or voting techniques fail to do so.
AB - Over the years, countless algorithms have been proposed to solve the problem of speech enhancement from a noisy mixture. Many have succeeded in improving at least parts of the signal, while often deteriorating others. Based on the assumption that different algorithms are likely to enjoy different qualities and suffer from different flaws, we investigate the possibility of combining the strengths of multiple speech enhancement algorithms, formulating the problem in an ensemble learning framework. As a first example of such a system, we consider the prediction of a time-frequency mask obtained from the clean speech, based on the outputs of various algorithms applied on the noisy mixture. We consider several approaches involving various notions of context and various machine learning algorithms for classification, in the case of binary masks, and regression, in the case of continuous masks. We show that combining several algorithms in this way can lead to an improvement in enhancement performance, while simple averaging or voting techniques fail to do so.
KW - Classification
KW - Ensemble learning
KW - Speech enhancement
KW - Stacking
KW - Time-frequency mask
UR - http://www.scopus.com/inward/record.url?scp=84893573842&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893573842&partnerID=8YFLogxK
U2 - 10.1109/WASPAA.2013.6701888
DO - 10.1109/WASPAA.2013.6701888
M3 - Conference contribution
AN - SCOPUS:84893573842
SN - 9781479909728
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
BT - 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013
T2 - 2013 14th IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2013
Y2 - 20 October 2013 through 23 October 2013
ER -