Automatic Speech Recognition for Mixed Dialect Utterances by Mixing Dialect Language Models

Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, Hiroshi G. Okuno

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)


This paper presents an automatic speech recognition (ASR) system that accepts a mixture of various kinds of dialects. The system recognizes dialect utterances on the basis of the statistical simulation of vocabulary transformation and combinations of several dialect models. Previous dialect ASR systems were based on handcrafted dictionaries for several dialects, which involved costly processes. The proposed system statistically trains transformation rules between a common language and dialects, and simulates a dialect corpus for ASR on the basis of a machine translation technique. The rules are trained with small sets of parallel corpora to make up for the lack of linguistic resources on dialects. The proposed system also accepts mixed dialect utterances that contain a variety of vocabularies. In fact, spoken language is not a single dialect but a mixed dialect that is affected by the circumstances of speakers' backgrounds (e.g., native dialects of their parents or where they live). We addressed two methods to combine several dialects appropriately for each speaker. The first was recognition with language models of mixed dialects with automatically estimated weights that maximized the recognition likelihood. This method performed the best, but calculation was very expensive because it conducted grid searches of combinations of dialect mixing proportions that maximized the recognition likelihood. The second was integration of results of recognition from each single dialect language model. The improvements with this model were slightly smaller than those with the first method. Its calculation cost was, however, inexpensive and it worked in real-time on general workstations. Both methods achieved higher recognition accuracies for all speakers than those with the single dialect models and the common language model, and we could choose a suitable model for use in ASR that took into consideration the computational costs and recognition accuracies.

Original languageEnglish
Article number7001195
Pages (from-to)373-382
Number of pages10
JournalIEEE/ACM Transactions on Speech and Language Processing
Issue number2
Publication statusPublished - 2015 Feb 1
Externally publishedYes


  • Corpus simulation
  • mixture of dialects
  • Speech recognition

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Media Technology
  • Acoustics and Ultrasonics
  • Instrumentation
  • Linguistics and Language
  • Speech and Hearing


Dive into the research topics of 'Automatic Speech Recognition for Mixed Dialect Utterances by Mixing Dialect Language Models'. Together they form a unique fingerprint.

Cite this