Spectrum conversion using prosodic information

Ryo Mochizuki*, Tadashi Okubo, Tetsunori Kobayashi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


For speaker conversion with spectral conversion using GMM, a method is proposed for adding information relating to prosody to the characteristic values and improving conversion precision. In conventional spectral conversion using GMM, only the unaltered spectral parameters are used as input information, However, the voice spectrum is generally related to the closeness of the base frequencies during speech, and therefore, improvement in the quality of the converted voice can be expected with the consideration of prosodic information at the time of conversion. Thus, a method is proposed for spectrum conversion with good precision which assumes the application to actual synthesis by rule, and performs GMM training using the prosodic information of the conversion source and conversion target. Also, the proposed spectrum conversion is applied to speech conversion in a voice synthesis framework. At this time, a method is proposed for preparing triphone joint vectors to ensure training data of a greater number of prosodic conditions using a parallel corpus. A physical evaluation using the cepstrum distance indicates that the use of prosodic information is effective in improving the precision of spectrum conversion. An auditory evaluation was performed of voice quality and speech characteristics after conversion with a conventional method and the proposed method, and indicated that the proposed method is effective in an auditory sense as well.

Original languageEnglish
Pages (from-to)12-20
Number of pages9
JournalSystems and Computers in Japan
Issue number10
Publication statusPublished - 2007 Sept 1


  • Cepstrum
  • GMM
  • Prosodic information
  • Speaker conversion
  • Voice synthesis

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture
  • Computational Theory and Mathematics


Dive into the research topics of 'Spectrum conversion using prosodic information'. Together they form a unique fingerprint.

Cite this