An automatic labeling technique for known speech samples is proposed to construct a fine speech database for investigating the acoustic-phonetic characteristics of speech. An acoustically compact descriptive unit called a demiphoneme (DPH) is introduced, and a word (or sentence) is represented by a network using DPHs which cover the acoustic variation contained in the utterances of the word (or sentence). An input speech sample is segmented and labeled to produce the optimal DPH sequence by the following algorithm: (a) Generate possible DPH sequences from an input phoneme sequence by rules. (b) Segment the sample parameter sequence. The resultant segments (called SEGs) are the candidates of DPH boundaries. (c) Determine the optimal correspondence between the SEG sequence and each of the DPH sequences generated in (b). (d) Decide the minimum-error DPH sequence and corresponding SEG boundaries. The feasibility of the method is confirmed by applying it to a word set containing 53 city names.
|ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
|Published - 1986
ASJC Scopus subject areas