The design of speech recognition system based on acoustically-derived, segmental units can be divided in three steps: unit design, lexicon building and pronunciation modeling. We formulate an iterative unit design procedure which consistently uses a maximum likelihood (ML) objective in successive application of resegmentation and model re-estimation. The lexicon building allows multi-word entries in the lexicon but restricts the number of these entries in order to avoid a too costly search. Selected multi-word lexical entries are those with high frequency (such as function words) and those which consistently exhibit cross-word phone assimilation. The stochastic pronunciation model represents the likelihood of a particular acoustic segment sequence given the phonetic baseform of a lexical item, where the sequence of baseform phones are treated as a Markov state sequence and each state can emit multiple segments.
|ジャーナル||ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings|
|出版ステータス||Published - 1996|
|イベント||Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP. Part 1 (of 6) - Atlanta, GA, USA|
継続期間: 1996 5月 7 → 1996 5月 10
ASJC Scopus subject areas