SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin Jin*

*この研究の対応する著者

研究成果: Conference article査読

3 被引用数 (Scopus)

抄録

Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation methods to boost the training of SVS systems, including several strategies customized to SVS based on pitch augmentation and mix-up augmentation. To further stabilize the training, we introduce the cycle-consistent training strategy. Extensive experiments on two public singing databases demonstrate that our proposed augmentation methods and the stabilizing training strategy can significantly improve the performance on both objective and subjective evaluations.

本文言語English
ページ(範囲)4272-4276
ページ数5
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2022-September
DOI
出版ステータスPublished - 2022
外部発表はい
イベント23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
継続期間: 2022 9月 182022 9月 22

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル