This paper proposes a new framework for processing rhythm in speech where temporal types are recognized using statistical models of mora durations. Temporal patterns, such as rhythm and tempo in speech, contain some basic information about communication through the spoken language. This information has not yet been fully used in speech recognition. This paper proposes that temporal types themselves be modeled and recognized by statistical models. Using the ASJ Continuous Speech Database, experiments for recognizing temporal types of bunsetsu (short phrases) were conducted. Approximately 72% of temporal types were identified correctly using these models, without using information about the length of pauses and fundamental frequencies. The recognized types were very consistent (approximately 94% were of the same types) for closed and open models. These results show the promising potential of the proposed framework.
|Published - 1994
|3rd International Conference on Spoken Language Processing, ICSLP 1994 - Yokohama, Japan
継続期間: 1994 9月 18 → 1994 9月 22
|3rd International Conference on Spoken Language Processing, ICSLP 1994
|94/9/18 → 94/9/22
ASJC Scopus subject areas