Speech planning of an anthropomorphic talking robot for consonant sounds production

Kazufumi Nishikawa*, Akihiro Imai, Takayuki Ogawara, Hideaki Takanobu, Takemi Mochida, Atsuo Takanishi

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

3 Citations (Scopus)


This paper describes the speech planning of the anthropomorphic talking robot WT-1R (Waseda Talker-No.1 Refined) for the production of consonant sounds. WT-1R has articulators (the tongue, lips, teeth, nasal cavity and soft palate) and vocal organs (the lungs and vocal cords), and can reproduce human vocal movement. Its total DOF (degrees of freedom) is 15. The vocal movement of WT-1R for vowels is steady. We produced Japanese vowels (/a/, /i/, /u/, /e/, /o/) using the first robot WT-1 in 2000. However, the vocal movement for consonant sounds is transient. We must control the 15-DOF talking robot coordinately in the space and time to reproduce the complicated phenomena of the consonant sounds. Therefore, because the Japanese voice generally consists of two phonemes of the first consonant sound and the last vowel, we proposed the speech planning of WT-1R by considering the phenomenon of the voice as three parts (steady consonant sound, transient consonant sound and vowel). WT-1R could produce Japanese vowels (/a/, /i/, /u/, /e/, /o/) and some consonant sounds (/s/, /h/, /m/, /p/ and /waseda/).

Original languageEnglish
Pages (from-to)1830-1835
Number of pages6
JournalProceedings - IEEE International Conference on Robotics and Automation
Publication statusPublished - 2002 Jan 1
Event2002 IEEE International Conference on Robotics and Automation - Washington, DC, United States
Duration: 2002 May 112002 May 15


  • Humanoid robot
  • Sound
  • Speech production
  • Vocal movement
  • Voice

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering


Dive into the research topics of 'Speech planning of an anthropomorphic talking robot for consonant sounds production'. Together they form a unique fingerprint.

Cite this