抄録
This paper focuses on applications of Bayesian approaches to acoustic modeling for speech recognition and related speechprocessing applications. Bayesian approaches have been widely studied in the fields of statistics and machine learning, and one of their advantages is that their generalization capability is better than that of conventional approaches (e.g., maximum likelihood). On the other hand, since inference in Bayesian approaches involves integrals and expectations that are mathematically intractable in most cases and require heavy numerical computations, it is generally difficult to apply them to practical speech recognition problems.However, there have beenmany such attempts, and this paper aims to summarize these attempts to encourage further progress on Bayesian approaches in the speech-processing field. This paper describes various applications of Bayesian approaches to speech processing in terms of the four typical ways of approximating Bayesian inferences, i.e., maximum a posteriori approximation, model complexity control using a Bayesian information criterion based on asymptotic approximation, variational approximation, and Markov chain Monte Carlo-based sampling techniques.
本文言語 | English |
---|---|
論文番号 | e5 |
ジャーナル | APSIPA Transactions on Signal and Information Processing |
巻 | 1 |
DOI | |
出版ステータス | Published - 2012 12月 |
外部発表 | はい |
ASJC Scopus subject areas
- 信号処理
- 情報システム
フィンガープリント
「Bayesian approaches to acoustic modeling: A review」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。引用スタイル
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS
In: APSIPA Transactions on Signal and Information Processing, Vol. 1, e5, 12.2012.
研究成果: Review article › 査読
}
TY - JOUR
T1 - Bayesian approaches to acoustic modeling
T2 - A review
AU - Watanabe, Shinji
AU - Nakamura, Atsushi
N1 - Funding Information: Watanabe Shinji 1 Shinji Watanabe received his B.S., M.S., and Dr. Eng. degrees from Waseda University, Tokyo, Japan, in 1999, 2001, and 2006, respectively. From 2001 to 2011, he was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan. From January to March in 2009, he was a visiting scholar at Georgia Institute of Technology, Atlanta, GA. Since 2011, he has been working at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA. His research interests include Bayesian learning, pattern recognition, and speech and spoken language processing. He is a member of the Acoustical Society of Japan (ASJ) and the Institute of Electronics, Information and Communications Engineers (IEICE), and a senior member of the Institute of Electrical and Electronics Engineers (IEEE). He received the Awaya Award from the ASJ in 2003, the Paper Award from the IEICE in 2004, the Itakura Award from ASJ in 2006, and the TELECOM System Technology Award from the Telecommunications Advancement Foundation in 2006. He is currently an Associate Editor of IEEE Transactions on Audio Speech and Language Processing. Funding Information: Shinji Watanabe received his B.S., M.S., and Dr. Eng. degrees from Waseda University, Tokyo, Japan, in 1999, 2001, and 2006, respectively. From 2001 to 2011, he was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan. From January to March in 2009, he was a visiting scholar at Georgia Institute of Technology, Atlanta, GA. Since 2011, he has been working at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA. His research interests include Bayesian learning, pattern recognition, and speech and spoken language processing. He is a member of the Acoustical Society of Japan (ASJ) and the Institute of Electronics, Information and Communications Engineers (IEICE), and a senior member of the Institute of Electrical and Electronics Engineers (IEEE). He received the Awaya Award from the ASJ in 2003, the Paper Award from the IEICE in 2004, the Itakura Award from ASJ in 2006, and the TELECOM System Technology Award from the Telecommunications Advancement Foundation in 2006. He is currently an Associate Editor of IEEE Transactions on Audio Speech and Language Processing. Funding Information: Watanabe Shinji 1 Shinji Watanabe received his B.S., M.S., and Dr. Eng. degrees from Waseda University, Tokyo, Japan, in 1999, 2001, and 2006, respectively. From 2001 to 2011, he was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan. From January to March in 2009, he was a visiting scholar at Georgia Institute of Technology, Atlanta, GA. Since 2011, he has been working at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA. His research interests include Bayesian learning, pattern recognition, and speech and spoken language processing. He is a member of the Acoustical Society of Japan (ASJ) and the Institute of Electronics, Information and Communications Engineers (IEICE), and a senior member of the Institute of Electrical and Electronics Engineers (IEEE). He received the Awaya Award from the ASJ in 2003, the Paper Award from the IEICE in 2004, the Itakura Award from ASJ in 2006, and the TELECOM System Technology Award from the Telecommunications Advancement Foundation in 2006. He is currently an Associate Editor of IEEE Transactions on Audio Speech and Language Processing. Nakamura Atsushi 2 Atsushi Nakamura received the B.E., M.E., and Dr.Eng. degrees from Kyushu University, Fukuoka, Japan, in 1985, 1987, and 2001, respectively. In 1987, he joined Nippon Telegraph and Telephone Corporation (NTT), where he engaged in the research and development of network service platforms, including studies on the application of speech processing technologies to network services, at Musashino Electrical Communication Laboratories, Tokyo, Japan. From 1994 to 2000, he was with the Advanced Telecommunications Research (ATR) Institute, Kyoto, Japan, as a Senior Researcher, undertaking research on spontaneous speech recognition, the construction of spoken language databases, and the development of speech translation systems. Since April 2000, he has been with NTT Communication Science Laboratories, Kyoto, Japan. His research interests include the acoustic modeling of speech, speech recognition and synthesis, spoken language processing systems, speech production and perception, computational phonetics and phonology, and the application of learning theories to signal analysis, and modeling. Dr. Nakamura is a senior member of the Institute of Electrical and Electronics Engineers (IEEE), serves as a member of the IEEE Machine Learning for Signal Processing (MLSP) Technical Committee, and has served as a Vice Chair of the IEEE Signal Processing Society Kansai Chapter. He is also a member of the Institute of Electronics, Information and Communication Engineering (IEICE) and the Acoustical Society of Japan (ASJ). He received the IEICE Paper Award in 2004, and twice received the TELECOM System Technology Award of the Telecommunications Advancement Foundation, in 2006 and 2009. Funding Information: Atsushi Nakamura received the B.E., M.E., and Dr.Eng. degrees from Kyushu University, Fukuoka, Japan, in 1985, 1987, and 2001, respectively. In 1987, he joined Nippon Telegraph and Telephone Corporation (NTT), where he engaged in the research and development of network service platforms, including studies on the application of speech processing technologies to network services, at Musashino Electrical Communication Laboratories, Tokyo, Japan. From 1994 to 2000, he was with the Advanced Telecommunications Research (ATR) Institute, Kyoto, Japan, as a Senior Researcher, undertaking research on spontaneous speech recognition, the construction of spoken language databases, and the development of speech translation systems. Since April 2000, he has been with NTT Communication Science Laboratories, Kyoto, Japan. His research interests include the acoustic modeling of speech, speech recognition and synthesis, spoken language processing systems, speech production and perception, computational phonetics and phonology, and the application of learning theories to signal analysis, and modeling. Dr. Nakamura is a senior member of the Institute of Electrical and Electronics Engineers (IEEE), serves as a member of the IEEE Machine Learning for Signal Processing (MLSP) Technical Committee, and has served as a Vice Chair of the IEEE Signal Processing Society Kansai Chapter. He is also a member of the Institute of Electronics, Information and Communication Engineering (IEICE) and the Acoustical Society of Japan (ASJ). He received the IEICE Paper Award in 2004, and twice received the TELECOM System Technology Award of the Telecommunications Advancement Foundation, in 2006 and 2009. Funding Information: Nakamura Atsushi 2 Atsushi Nakamura received the B.E., M.E., and Dr.Eng. degrees from Kyushu University, Fukuoka, Japan, in 1985, 1987, and 2001, respectively. In 1987, he joined Nippon Telegraph and Telephone Corporation (NTT), where he engaged in the research and development of network service platforms, including studies on the application of speech processing technologies to network services, at Musashino Electrical Communication Laboratories, Tokyo, Japan. From 1994 to 2000, he was with the Advanced Telecommunications Research (ATR) Institute, Kyoto, Japan, as a Senior Researcher, undertaking research on spontaneous speech recognition, the construction of spoken language databases, and the development of speech translation systems. Since April 2000, he has been with NTT Communication Science Laboratories, Kyoto, Japan. His research interests include the acoustic modeling of speech, speech recognition and synthesis, spoken language processing systems, speech production and perception, computational phonetics and phonology, and the application of learning theories to signal analysis, and modeling. Dr. Nakamura is a senior member of the Institute of Electrical and Electronics Engineers (IEEE), serves as a member of the IEEE Machine Learning for Signal Processing (MLSP) Technical Committee, and has served as a Vice Chair of the IEEE Signal Processing Society Kansai Chapter. He is also a member of the Institute of Electronics, Information and Communication Engineering (IEICE) and the Acoustical Society of Japan (ASJ). He received the IEICE Paper Award in 2004, and twice received the TELECOM System Technology Award of the Telecommunications Advancement Foundation, in 2006 and 2009. Funding Information: Dr. Nakamura is a senior member of the Institute of Electrical and Electronics Engineers (IEEE), serves as a member of the IEEE Machine Learning for Signal Processing (MLSP) Technical Committee, and has served as a Vice Chair of the IEEE Signal Processing Society Kansai Chapter. He is also a member of the Institute of Electronics, Information and Communication Engineering (IEICE) and the Acoustical Society of Japan (ASJ). He received the IEICE Paper Award in 2004, and twice received the TELECOM System Technology Award of the Telecommunications Advancement Foundation, in 2006 and 2009.
PY - 2012/12
Y1 - 2012/12
N2 - This paper focuses on applications of Bayesian approaches to acoustic modeling for speech recognition and related speechprocessing applications. Bayesian approaches have been widely studied in the fields of statistics and machine learning, and one of their advantages is that their generalization capability is better than that of conventional approaches (e.g., maximum likelihood). On the other hand, since inference in Bayesian approaches involves integrals and expectations that are mathematically intractable in most cases and require heavy numerical computations, it is generally difficult to apply them to practical speech recognition problems.However, there have beenmany such attempts, and this paper aims to summarize these attempts to encourage further progress on Bayesian approaches in the speech-processing field. This paper describes various applications of Bayesian approaches to speech processing in terms of the four typical ways of approximating Bayesian inferences, i.e., maximum a posteriori approximation, model complexity control using a Bayesian information criterion based on asymptotic approximation, variational approximation, and Markov chain Monte Carlo-based sampling techniques.
AB - This paper focuses on applications of Bayesian approaches to acoustic modeling for speech recognition and related speechprocessing applications. Bayesian approaches have been widely studied in the fields of statistics and machine learning, and one of their advantages is that their generalization capability is better than that of conventional approaches (e.g., maximum likelihood). On the other hand, since inference in Bayesian approaches involves integrals and expectations that are mathematically intractable in most cases and require heavy numerical computations, it is generally difficult to apply them to practical speech recognition problems.However, there have beenmany such attempts, and this paper aims to summarize these attempts to encourage further progress on Bayesian approaches in the speech-processing field. This paper describes various applications of Bayesian approaches to speech processing in terms of the four typical ways of approximating Bayesian inferences, i.e., maximum a posteriori approximation, model complexity control using a Bayesian information criterion based on asymptotic approximation, variational approximation, and Markov chain Monte Carlo-based sampling techniques.
KW - Approximate bayesian inference
KW - Bayesian approach
KW - Machine learning
KW - Speech processing
UR - http://www.scopus.com/inward/record.url?scp=84887091716&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84887091716&partnerID=8YFLogxK
U2 - 10.1017/ATSIP.2012.6
DO - 10.1017/ATSIP.2012.6
M3 - Review article
AN - SCOPUS:84887091716
SN - 2048-7703
VL - 1
JO - APSIPA Transactions on Signal and Information Processing
JF - APSIPA Transactions on Signal and Information Processing
M1 - e5
ER -