DEMIPHONEME NETWORK REPRESENTATION OF SPEECH AND AUTOMATIC LABELING TECHNIQUES FOR SPEECH DATA BASE CONSTRUCTION.

Kazuyo Tanaka*, Satoru Hayamizu, Kozo Ohta

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

10 Citations (Scopus)

Abstract

An automatic labeling technique for known speech samples is proposed to construct a fine speech database for investigating the acoustic-phonetic characteristics of speech. An acoustically compact descriptive unit called a demiphoneme (DPH) is introduced, and a word (or sentence) is represented by a network using DPHs which cover the acoustic variation contained in the utterances of the word (or sentence). An input speech sample is segmented and labeled to produce the optimal DPH sequence by the following algorithm: (a) Generate possible DPH sequences from an input phoneme sequence by rules. (b) Segment the sample parameter sequence. The resultant segments (called SEGs) are the candidates of DPH boundaries. (c) Determine the optimal correspondence between the SEG sequence and each of the DPH sequences generated in (b). (d) Decide the minimum-error DPH sequence and corresponding SEG boundaries. The feasibility of the method is confirmed by applying it to a word set containing 53 city names.

Original languageEnglish
Pages (from-to)309-312
Number of pages4
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication statusPublished - 1986
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'DEMIPHONEME NETWORK REPRESENTATION OF SPEECH AND AUTOMATIC LABELING TECHNIQUES FOR SPEECH DATA BASE CONSTRUCTION.'. Together they form a unique fingerprint.

Cite this