TY - JOUR
T1 - Structured discriminative models for speech recognition
T2 - An overview
AU - Gales, Mark John Francis
AU - Watanabe, Shinji
AU - Fosler-Lussier, Eric
N1 - Funding Information:
The authors would like to thank Rogier van Dalen, Anton Ragni, Austin Zhang, and Yotaro Kubo for fruitful discussions. Eric Fosler-Lussier gratefully acknowledges support by National Science Foundation grant IIS-0643901 (CAREER).
PY - 2012
Y1 - 2012
N2 - Automatic speech recognition (ASR) systems classify structured sequence data, where the label sequences (sentences) must be inferred from the observation sequences (the acoustic waveform). The sequential nature of the task is one of the reasons why generative classifiers, based on combining hidden Markov model (HMM) acoustic models and N-gram language models using Bayes rule, have become the dominant technology used in ASR. Conversely, machine learning and natural language processing (NLP) research areas are increasingly dominated by discriminative approaches, where the class posteriors are directly modeled. This article describes recent work in the area of structured discriminative models for ASR. To handle continuous, variable length observation sequences, the approaches applied to NLP tasks must be modified. This article discusses a variety of approaches for applying structured discriminative models to ASR, both from the current literature and possible future approaches. We concentrate on structured models themselves, the descriptive features of observations commonly used within the models, and various options for optimizing the parameters of the model.
AB - Automatic speech recognition (ASR) systems classify structured sequence data, where the label sequences (sentences) must be inferred from the observation sequences (the acoustic waveform). The sequential nature of the task is one of the reasons why generative classifiers, based on combining hidden Markov model (HMM) acoustic models and N-gram language models using Bayes rule, have become the dominant technology used in ASR. Conversely, machine learning and natural language processing (NLP) research areas are increasingly dominated by discriminative approaches, where the class posteriors are directly modeled. This article describes recent work in the area of structured discriminative models for ASR. To handle continuous, variable length observation sequences, the approaches applied to NLP tasks must be modified. This article discusses a variety of approaches for applying structured discriminative models to ASR, both from the current literature and possible future approaches. We concentrate on structured models themselves, the descriptive features of observations commonly used within the models, and various options for optimizing the parameters of the model.
UR - http://www.scopus.com/inward/record.url?scp=85032751545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032751545&partnerID=8YFLogxK
U2 - 10.1109/MSP.2012.2207140
DO - 10.1109/MSP.2012.2207140
M3 - Review article
AN - SCOPUS:85032751545
SN - 1053-5888
VL - 29
SP - 70
EP - 81
JO - IEEE Acoustics, Speech, and Signal Processing Newsletter
JF - IEEE Acoustics, Speech, and Signal Processing Newsletter
IS - 6
M1 - 6296527
ER -