TY - JOUR
T1 - Automatic extraction of fundamental frequency control rules by statistical analysis
AU - Hirai, Toshio
AU - Iwahashi, Naoto
AU - Higuchi, Norio
AU - Sagisaka, Yoshinori
PY - 1997/3
Y1 - 1997/3
N2 - This paper aims at the improvement of the naturalness of Japanese synthetic speech and proposes a method of extracting automatically the rules for controlling the voice fundamental frequency (written as F0). The proposed method is composed of two steps. (1) The F0 time-series pattern of a sufficient amount of speech data is represented by parameters under Fujisaki's model. (2) The F0 control rule that estimates die parameter values from the language information is extracted by statistical analysis. The proposed method is applied to 200 Japanese sentences read by a speaker, and the relation between the language information and the parameter values is derived by analyzing the obtained F0 control rules. The following properties are identified. (1) The phrase command diminishes when the number of morae in the preceding phrase decreases. (2) The accent component is reduced when the number of morae in the higher-pitched part of the accented phrase is larger. Relation (2) is a refinement of knowledge already obtained by the analysis of a small number of samples. Thus, it is shown that adequate F0 control rules can be extracted automatically by the proposed method.
AB - This paper aims at the improvement of the naturalness of Japanese synthetic speech and proposes a method of extracting automatically the rules for controlling the voice fundamental frequency (written as F0). The proposed method is composed of two steps. (1) The F0 time-series pattern of a sufficient amount of speech data is represented by parameters under Fujisaki's model. (2) The F0 control rule that estimates die parameter values from the language information is extracted by statistical analysis. The proposed method is applied to 200 Japanese sentences read by a speaker, and the relation between the language information and the parameter values is derived by analyzing the obtained F0 control rules. The following properties are identified. (1) The phrase command diminishes when the number of morae in the preceding phrase decreases. (2) The accent component is reduced when the number of morae in the higher-pitched part of the accented phrase is larger. Relation (2) is a refinement of knowledge already obtained by the analysis of a small number of samples. Thus, it is shown that adequate F0 control rules can be extracted automatically by the proposed method.
KW - Fujisaki model
KW - Speech synthesis
KW - Statistical analysis
KW - Superpositional fundamental frequency control model
KW - Voice fundamental frequency control
UR - http://www.scopus.com/inward/record.url?scp=0031084194&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031084194&partnerID=8YFLogxK
U2 - 10.1002/(SICI)1520-684X(199703)28:3<91::AID-SCJ10>3.0.CO;2-P
DO - 10.1002/(SICI)1520-684X(199703)28:3<91::AID-SCJ10>3.0.CO;2-P
M3 - Article
AN - SCOPUS:0031084194
SN - 0882-1666
VL - 28
SP - 91
EP - 100
JO - Systems and Computers in Japan
JF - Systems and Computers in Japan
IS - 3
ER -