抄録
In this paper, a new language model, the Multi-Class Composite N-gram, is proposed to avoid a data sparseness problem in small amount of training data. The Multi-Class Composite Ngram maintains an accurate word prediction capability and reliability for sparse data with a compact model size based on multiple word clusters, so-called Multi-Classes. In the Multi-Class, the statistical connectivity at each position of the N-grams is regarded as word attributes, and one word cluster each is created to represent positional attributes. Furthermore, by introducing higher order word N-grams through the grouping of frequent word successions, Multi-Class N-grams are extended to Multi-Class Composite N-grams. In experiments, the Multi- Class Composite N-grams result in 9.5% lower perplexity and a 16% lower word error rate in speech recognition with a 40% smaller parameter size than conventional word 3-grams.
本文言語 | English |
---|---|
ホスト出版物のタイトル | EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology |
出版社 | International Speech Communication Association |
ページ | 25-28 |
ページ数 | 4 |
ISBN(電子版) | 8790834100, 9788790834104 |
出版ステータス | Published - 2001 |
外部発表 | はい |
イベント | 7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001 - Aalborg, Denmark 継続期間: 2001 9月 3 → 2001 9月 7 |
Other
Other | 7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001 |
---|---|
国/地域 | Denmark |
City | Aalborg |
Period | 01/9/3 → 01/9/7 |
ASJC Scopus subject areas
- 通信
- 言語学および言語
- コンピュータ サイエンスの応用
- ソフトウェア