In order to improve the naturalness of synthetic speech in Japanese text-to-speech or concept-to-speech conversion, we introduce a new scheme to synthesize arbitrary speech sentences using the natural sentence speech data-base. In our synthesis method, a series of synthetic parameters is generated using patterns which are extracted from natural speech waveforms. In the first step, the basic sentence is selected from the data-base against a target sentence. The factors for the selection are phrase dependency structure(separation degree), number of mora, type of accent and phonemic labels. In the second step, if necessary, the basic accent-phrase is selected from the same data-base against the each target, accent-phrase. The factors considered in selecting the each accent-phrase are the separation degree, the number of mora, the type of accent and the phonemic labels. In the third step, pitch pattern is generated from those waveform units selected in the first and the second step. In the last step, the phonemic parameters are generated. These phonemic parameters for several morae are extracted on the former three steps. Therefore, in this step, we only have to replace the phonemic parameters for ill-suited morae. As the pitch pattern is generated using patterns directly extracted from real speech, it is expected to be more natural than any other pattern which is estimated by any model. We have examined this method on Japanese sentence speech to the present and affirmed that the synthetic sound preserves human-like features fairly well.
|出版ステータス||Published - 1994|
|イベント||3rd International Conference on Spoken Language Processing, ICSLP 1994 - Yokohama, Japan|
継続期間: 1994 9月 18 → 1994 9月 22
|Conference||3rd International Conference on Spoken Language Processing, ICSLP 1994|
|Period||94/9/18 → 94/9/22|
ASJC Scopus subject areas