Vowel imitation using vocal tract model and recurrent neural network

Hisashi Kanda*, Tetsuya Ogata, Kazunori Komatani, Hiroshi G. Okuno

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)


A vocal imitation system was developed using a computational model that supports the motor theory of speech perception. A critical problem in vocal imitation is how to generate speech sounds produced by adults, whose vocal tracts have physical properties (i.e., articulatory motions) differing from those of infants' vocal tracts. To solve this problem, a model based on the motor theory of speech perception, was constructed. Applying this model enables the vocal imitation system to estimate articulatory motions for unexperienced speech sounds that have not actually been generated by the system. The system was implemented by using Recurrent Neural Network with Parametric Bias (RNNPB) and a physical vocal tract model, called Maeda model. Experimental results demonstrated that the system was sufficiently robust with respect to individual differences in speech sounds and could imitate unexperienced vowel sounds.

Original languageEnglish
Title of host publicationNeural Information Processing - 14th International Conference, ICONIP 2007, Revised Selected Papers
Number of pages11
EditionPART 2
Publication statusPublished - 2008 Oct 23
Externally publishedYes
Event14th International Conference on Neural Information Processing, ICONIP 2007 - Kitakyushu, Japan
Duration: 2007 Nov 132007 Nov 16

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume4985 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference14th International Conference on Neural Information Processing, ICONIP 2007

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Vowel imitation using vocal tract model and recurrent neural network'. Together they form a unique fingerprint.

Cite this