Speaker normalized acoustic modeling based on 3-D viterbi decoding

T. Fukada, Y. Sagisaka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This paper describes a novel method for speaker normalization based on a frequency warping approach to reduce variations due to speaker-induced factors such as the vocal tract length. In our approach, a speaker normalized acoustic model is trained using time-varying (i.e., state, phoneme or word dependent) warping factors, while in the conventional approaches, the frequency warping factor is fixed for each speaker. These time-varying frequency warping factors are determined by a 3-dimensional (i.e., input frames, HMM states and warping factors) Viterbi decoding procedure. Experimental results on Japanese spontaneous speech recognition show that the proposed method yields a 9.7% improvement in speech recognition accuracy compared to the conventional speaker-independent model.

Original languageEnglish
Title of host publicationProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
Pages437-440
Number of pages4
DOIs
Publication statusPublished - 1998
Externally publishedYes
Event1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998 - Seattle, WA, United States
Duration: 1998 May 121998 May 15

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
ISSN (Print)1520-6149

Conference

Conference1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
Country/TerritoryUnited States
CitySeattle, WA
Period98/5/1298/5/15

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Speaker normalized acoustic modeling based on 3-D viterbi decoding'. Together they form a unique fingerprint.

Cite this