Statistical language modeling with a class-based n-multigram model

Sabine Deligne, Yoshinori Sagisaka

研究成果: Article査読

4 被引用数 (Scopus)

抄録

In this paper, we present a stochastic language-modeling tool which aims at retrieving variable-length phrases (multigrams), assuming n-gram dependencies between them, hence the name of the model: n-multigram. The estimation of the probability distribution of the phrases is intermixed with a phrase-clustering procedure in a way which jointly optimizes the likelihood of the data. As a result, the language data are iteratively structured at both a paradigmatic and a syntagmatic level in a fully integrated way. We evaluate the 2-multigram model as a statistical language model on ATIS, a task-oriented database consisting of air travel reservations. Experiments show that the 2-multigrarn model allows a reduction of 10% of the word error rate on ATIS with respect to the usual trigram model, with 25% fewer parameters than in the trigram model. In addition, we illustrate the ability of this model to merge semantically related phrases of different lengths into a common class.

本文言語English
ページ(範囲)261-279
ページ数19
ジャーナルComputer Speech and Language
14
3
DOI
出版ステータスPublished - 2000 7月
外部発表はい

ASJC Scopus subject areas

  • ソフトウェア
  • 理論的コンピュータサイエンス
  • 人間とコンピュータの相互作用

フィンガープリント

「Statistical language modeling with a class-based n-multigram model」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル