Language modeling of Chinese personal names based on character units for continuous Chinese speech recognition

Xinhui Hu*, Hirofumi Yamamoto, Genichiro Kikui, Yoshinori Sagisaka

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, we analyze Chinese personal names to model their statistical phonotactic characteristics for continuous Chinese speech recognition. The analysis showed languagespecific characteristics of Chinese personal names and strongly suggested the advantage of character-unit oriented modeling. A hierarchical language model was composed by reflecting statistical phonotactic characteristics of Chinese personal names as a lower intra-word model, and ordinary inter-word neighboring characteristics as an upper multi-class composite N-gram model. These two layers of models were trained independently using different language corpora. For the modeling of given names, the syllable without tone information was selected as the unit for training the bi-gram. The properties of either one or two characters of a given name were introduced to simplify the length constraint of the modeling process. For Chinese family names, we simply added them directly in the recognition lexicon, since their numbers are very restricted. The results from Chinese speech recognition experiments revealed that the proposed hierarchical language model greatly improved the identification accuracy of the Chinese given names compared with the conventional wordclass N-gram model.

Original languageEnglish
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PublisherInternational Speech Communication Association
Pages1874-1877
Number of pages4
ISBN (Print)9781604234497
Publication statusPublished - 2006 Jan 1
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: 2006 Sept 172006 Sept 21

Publication series

NameINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Volume4

Conference

ConferenceINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Country/TerritoryUnited States
CityPittsburgh, PA
Period06/9/1706/9/21

Keywords

  • Chinese speech recognition
  • Hierarchical language model
  • Personal name identification

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Language modeling of Chinese personal names based on character units for continuous Chinese speech recognition'. Together they form a unique fingerprint.

Cite this