An extension of the state-observation dependency in Partly Hidden Markov Models and its application to continuous speech recognition

Tetsuji Ogawa*, Tetsunori Kobayashi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

We extend the state-observation dependencies in a Partly Hidden Markov Model (PHMM) and apply this model to continuous speech recognition. In a PHMM the observations and state transitions are dependent on a series of hidden and observable states. In the standard formulation of a PHMM, the observations and state transitions are conditioned on the same hidden state and observable state variables. Here we also condition the observations and state transitions on the same hidden states but condition the observations and state transitions on different observation states, respectively. This simple improvement to the model gives it significant flexibility allowing it to model stochastic processes more precisely. In addition, by integrating the PHMM containing this extended state-observation dependency with a standard HMM we can construct a stochastic model that we call a Smoothed Partly Hidden Markov Model (SPHMM). Results of continuous speech recognition on a newspaper read-speech have shown reductions of 10 and 24% in the error rate using the PHMM and SPHMM, respectively, compared to a standard HMM thereby displaying the effectiveness of the proposed models.

Original languageEnglish
Pages (from-to)31-39
Number of pages9
JournalSystems and Computers in Japan
Volume36
Issue number8
DOIs
Publication statusPublished - 2005 Jul 1

Keywords

  • Acoustic models
  • Continuous speech recognition
  • HMM
  • PHMM
  • SPHMM

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'An extension of the state-observation dependency in Partly Hidden Markov Models and its application to continuous speech recognition'. Together they form a unique fingerprint.

Cite this