抄録
Automatic Singing Label Calibration (ASLC) aims to enhance the labeling accuracy of coarse singing labels through the analysis of raw audio. However, the ASLC model faces limitations due to the challenges and costs associated with generating or augmenting real-world songs. To address this problem, we propose a novel approach to strengthen limited singing audio using easily available musical instrument audio. Directly using the musical instrument audio as a data augmentation for the singing audio is unreliable due to the distinct differences between vocal and instrumental sounds. Therefore, we employ transfer learning, which allows relevant knowledge to be transferred from one domain to another. In the pre-training stage, the ASLC model learns to predict the accurate labels from the musical instrument audio. We then consider the vocal as a special musical instrument and fine-tune the pretrained ASLC model using a singing annotation data set. Experimental results demonstrate that our transfer learning-based approach outperforms the original ASLC model. By leveraging the readily available musical instrument audio, our method achieves improved performance in enhancing the labeling accuracy of singing audio.
本文言語 | English |
---|---|
ページ(範囲) | 707-715 |
ページ数 | 9 |
ジャーナル | IEEJ Transactions on Electrical and Electronic Engineering |
巻 | 19 |
号 | 5 |
DOI | |
出版ステータス | Published - 2024 5月 |
ASJC Scopus subject areas
- 電子工学および電気工学