TY - GEN
T1 - Continuous vocal imitation with self-organized vowel spaces in recurrent neural network
AU - Kanda, Hisashi
AU - Ogata, Tetsuya
AU - Takahashi, Toru
AU - Komatani, Kazunori
AU - Okuno, Hiroshi G.
PY - 2009
Y1 - 2009
N2 - A continuous vocal imitation system was developed using a computational model that explains the process of phoneme acquisition by infants. Human infants perceive speech sounds not as discrete phoneme sequences but as continuous acoustic signals. One of critical problems in phoneme acquisition is the design for segmenting these continuous speech sounds. The key idea to solve this problem is that articulatory mechanisms such as the vocal tract help human beings to perceive speech sound units corresponding to phonemes. To segment acoustic signal with articulatory movement, we apply the segmenting method to our system by Recurrent Neural Network with Parametric Bias (RNNPB). This method determines the multiple segmentation boundaries in a temporal sequence using the prediction error of the RNNPB model, and the PB values obtained by the method can be encoded as kind of phonemes. Our system was implemented by using a physical vocal tract model, called the Maeda model. Experimental results demonstrated that our system can self-organize the same phonemes in different continuous sounds, and can imitate vocal sound involving arbitrary numbers of vowels using the vowel space in the RNNPB. This suggests that our model reflects theprocess of phoneme acquisition.
AB - A continuous vocal imitation system was developed using a computational model that explains the process of phoneme acquisition by infants. Human infants perceive speech sounds not as discrete phoneme sequences but as continuous acoustic signals. One of critical problems in phoneme acquisition is the design for segmenting these continuous speech sounds. The key idea to solve this problem is that articulatory mechanisms such as the vocal tract help human beings to perceive speech sound units corresponding to phonemes. To segment acoustic signal with articulatory movement, we apply the segmenting method to our system by Recurrent Neural Network with Parametric Bias (RNNPB). This method determines the multiple segmentation boundaries in a temporal sequence using the prediction error of the RNNPB model, and the PB values obtained by the method can be encoded as kind of phonemes. Our system was implemented by using a physical vocal tract model, called the Maeda model. Experimental results demonstrated that our system can self-organize the same phonemes in different continuous sounds, and can imitate vocal sound involving arbitrary numbers of vowels using the vowel space in the RNNPB. This suggests that our model reflects theprocess of phoneme acquisition.
UR - http://www.scopus.com/inward/record.url?scp=70350366377&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70350366377&partnerID=8YFLogxK
U2 - 10.1109/ROBOT.2009.5152818
DO - 10.1109/ROBOT.2009.5152818
M3 - Conference contribution
AN - SCOPUS:70350366377
SN - 9781424427895
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 4438
EP - 4443
BT - 2009 IEEE International Conference on Robotics and Automation, ICRA '09
T2 - 2009 IEEE International Conference on Robotics and Automation, ICRA '09
Y2 - 12 May 2009 through 17 May 2009
ER -