Topic-dependent N-gram models based on optimization of context lengths in LDA

Akira Nakamura*, Satoru Hayamizu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper describes a method that improves accuracy of N-gram language models which can be applied to on-line applications. The precision of a long-distance language model including LDA is influenced by a context length, or a length of the history used for prediction. In the proposed method, each of multiple LDA units estimates an optimum context length separately, then those predictions are integrated and N-gram probabilities are calculated. The method directly estimates the optimum context length suitable for prediction. Results show the method improves topic-dependent N-gram probabilities, particularly of a word related to specific topics, yielding higher and more stable performance comparing to an existing method.

Original languageEnglish
Title of host publicationProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PublisherInternational Speech Communication Association
Pages3066-3069
Number of pages4
Publication statusPublished - 2010
Externally publishedYes

Publication series

NameProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Keywords

  • LDA
  • Language model
  • Topic model

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Topic-dependent N-gram models based on optimization of context lengths in LDA'. Together they form a unique fingerprint.

Cite this