TY - JOUR
T1 - FastMVAE2
T2 - On Improving and Accelerating the Fast Variational Autoencoder-Based Source Separation Algorithm for Determined Mixtures
AU - Li, Li
AU - Kameoka, Hirokazu
AU - Makino, Shoji
N1 - Funding Information:
This work was supported by JST CREST under Grant JPMJCR19A3, Japan.
Publisher Copyright:
© 2014 IEEE.
PY - 2023
Y1 - 2023
N2 - This article proposes a new source model and training scheme to improve the accuracy and speed of the multichannel variational autoencoder (MVAE) method. The MVAE method is a recently proposed powerful multichannel source separation method. It consists of pretraining a source model represented by a conditional VAE (CVAE) and then estimating separation matrices along with other unknown parameters so that the log-likelihood is non-decreasing given an observed mixture signal. Although the MVAE method has been shown to provide high source separation performance, one drawback is the computational cost of the backpropagation steps in the separation-matrix estimation algorithm. To overcome this drawback, a method called 'FastMVAE' was subsequently proposed, which uses an auxiliary classifier VAE (ACVAE) to train the source model. By using the classifier and encoder trained in this way, the optimal parameters of the source model can be inferred efficiently, albeit approximately, in each step of the algorithm. However, the generalization capability of the trained ACVAE source model was not satisfactory, which led to poor performance in situations with unseen data. To improve the generalization capability, this article proposes a new model architecture (called the 'ChimeraACVAE' model) and a training scheme based on knowledge distillation. The experimental results revealed that the proposed source model trained with the proposed loss function achieved better source separation performance with less computation time than FastMVAE. We also confirmed that our methods were able to separate 18 sources with a reasonably good accuracy.
AB - This article proposes a new source model and training scheme to improve the accuracy and speed of the multichannel variational autoencoder (MVAE) method. The MVAE method is a recently proposed powerful multichannel source separation method. It consists of pretraining a source model represented by a conditional VAE (CVAE) and then estimating separation matrices along with other unknown parameters so that the log-likelihood is non-decreasing given an observed mixture signal. Although the MVAE method has been shown to provide high source separation performance, one drawback is the computational cost of the backpropagation steps in the separation-matrix estimation algorithm. To overcome this drawback, a method called 'FastMVAE' was subsequently proposed, which uses an auxiliary classifier VAE (ACVAE) to train the source model. By using the classifier and encoder trained in this way, the optimal parameters of the source model can be inferred efficiently, albeit approximately, in each step of the algorithm. However, the generalization capability of the trained ACVAE source model was not satisfactory, which led to poor performance in situations with unseen data. To improve the generalization capability, this article proposes a new model architecture (called the 'ChimeraACVAE' model) and a training scheme based on knowledge distillation. The experimental results revealed that the proposed source model trained with the proposed loss function achieved better source separation performance with less computation time than FastMVAE. We also confirmed that our methods were able to separate 18 sources with a reasonably good accuracy.
KW - Auxiliary classifier VAE
KW - fast algorithm
KW - knowledge distillation
KW - multichannel source separation
KW - multichannel variational autoencoder (MVAE)
UR - http://www.scopus.com/inward/record.url?scp=85140781956&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140781956&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2022.3214763
DO - 10.1109/TASLP.2022.3214763
M3 - Article
AN - SCOPUS:85140781956
SN - 2329-9290
VL - 31
SP - 96
EP - 110
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -