TY - JOUR
T1 - Automatic recognition of Japanese vowel length accounting for speaking rate and motivated by perception analysis
AU - Short, Greg
AU - Hirose, Keikichi
AU - Kondo, Mariko
AU - Minematsu, Nobuaki
N1 - Funding Information:
This research was funded in part by the fellowship Grant-in-Aid for JSPS Fellows No. 26-04006 awarded by the Japanese Society for the Promotion of Science (JSPS) to the first author. We would like to express our gratitude toward them and the opportunities they have provided.
Publisher Copyright:
© 2015 Elsevier B.V. All rights reserved.
PY - 2015/8/11
Y1 - 2015/8/11
N2 - Automatic recognition of vowel length in Japanese has several applications in speech processing such as for computer assisted language learning (CALL) systems. Standard automatic speech recognition (ASR) systems make use of hidden Markov models (HMMs) to carry out the recognition. However, HMMs are not particularly well-suited for this problem since classification of vowel length is dependent on prosodic information, and since it is a relative feature affected by changes in the durations of surrounding sounds which vary in part due to changes in speaking rates. That being said, it is not obvious how to design an algorithm to account for these contextual dependencies, since there is still not enough known about how humans perceive the contrast. Therefore, in this paper, we conduct perceptual experiments to further understand the mechanism of human vowel length recognition. In our research, we found that the perceptual boundary is largely affected by the vowels two prior, one prior, and following the vowel for which the length is being recognized. Based on these results and the works of others, we propose an algorithm which does post-processing on alignments output by HMMs to automatically recognize vowel length. The method is primarily composed of two levels of processing dealing first with local dependencies and then long-term dependencies. We test several variations of this algorithm. The method we develop is shown to have superior recognition capabilities and be robust against speaking rate differences producing significant improvements. We test this method on three different databases: a speaking rate database, a native database, and a non-native database.
AB - Automatic recognition of vowel length in Japanese has several applications in speech processing such as for computer assisted language learning (CALL) systems. Standard automatic speech recognition (ASR) systems make use of hidden Markov models (HMMs) to carry out the recognition. However, HMMs are not particularly well-suited for this problem since classification of vowel length is dependent on prosodic information, and since it is a relative feature affected by changes in the durations of surrounding sounds which vary in part due to changes in speaking rates. That being said, it is not obvious how to design an algorithm to account for these contextual dependencies, since there is still not enough known about how humans perceive the contrast. Therefore, in this paper, we conduct perceptual experiments to further understand the mechanism of human vowel length recognition. In our research, we found that the perceptual boundary is largely affected by the vowels two prior, one prior, and following the vowel for which the length is being recognized. Based on these results and the works of others, we propose an algorithm which does post-processing on alignments output by HMMs to automatically recognize vowel length. The method is primarily composed of two levels of processing dealing first with local dependencies and then long-term dependencies. We test several variations of this algorithm. The method we develop is shown to have superior recognition capabilities and be robust against speaking rate differences producing significant improvements. We test this method on three different databases: a speaking rate database, a native database, and a non-native database.
KW - Automatic recognition
KW - Duration
KW - Perception
KW - Resynthesis
KW - Stimulus continua
KW - Vowel length
UR - http://www.scopus.com/inward/record.url?scp=84938853804&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84938853804&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2015.07.001
DO - 10.1016/j.specom.2015.07.001
M3 - Article
AN - SCOPUS:84938853804
SN - 0167-6393
VL - 73
SP - 47
EP - 63
JO - Speech Communication
JF - Speech Communication
ER -