Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model

Sadao Hiroya*, Masaaki Honda

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)

Abstract

We present a speaker adaptation method that makes it possible to determine articulatory parameters from an unknown speaker's speech spectrum using an HMM (Hidden Markov Model)-based speech production model. The model consists of HMMs of articulatory parameters for each phoneme and an articulatory-to-acoustic mapping that transforms the articulatory parameters into a speech spectrum for each HMM state. The model is statistically constructed by using actual articulatory-acoustic data. In the adaptation method, geometrical differences in the vocal tract as well as the articulatory behavior in the reference model are statistically adjusted to an unknown speaker. First, the articulatory parameters are estimated from an unknown speaker's speech spectrum using the reference model. Secondly, the articulatory-to-acoustic mapping is adjusted by maximizing the output probability of the acoustic parameters for the estimated articulatory parameters of the unknown speaker. With the adaptation method, the RMS error between the estimated articulatory parameters and the observed ones is 1.65 mm. The improvement rate over the speaker independent model is 56.1 %.

Original languageEnglish
Pages (from-to)1071-1078
Number of pages8
JournalIEICE Transactions on Information and Systems
VolumeE87-D
Issue number5
Publication statusPublished - 2004 May
Externally publishedYes

Keywords

  • Articulatory-to- acoustic mapping
  • HMM-based speech production model
  • Speaker adaptation
  • Speech inversion

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model'. Together they form a unique fingerprint.

Cite this