Statistical modelling of speech segment duration by constrained tree regression

Naoto Iwahashi*, Yoshinori Sagisaka

*この研究の対応する著者

研究成果: Article査読

4 被引用数 (Scopus)

抄録

This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function.

本文言語English
ページ(範囲)15501559
ページ数1
ジャーナルIEICE Transactions on Information and Systems
E83-D
7
出版ステータスPublished - 2000
外部発表はい

ASJC Scopus subject areas

  • ソフトウェア
  • ハードウェアとアーキテクチャ
  • コンピュータ ビジョンおよびパターン認識
  • 電子工学および電気工学
  • 人工知能

フィンガープリント

「Statistical modelling of speech segment duration by constrained tree regression」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル