The realization of an anthropomorphic agent which looks like a real human is an important research topic for the broadening of the range of human-to-human communications through the use of a computer. We have proposed a technique for synthesizing natural talking-face animation that permits such communications. How to evaluate the performance of talking-face animation, however, has remained an outstanding issue. The performance of talking-face animation is determined in three parameters: (1) Does it reproduce human talking to an extent that permits lipreading? (2) Does it appear visually natural? (3) Is it accurately synchronized with voice? In this paper, we first presented talking-face animation along with the voice to subjects and conducted experiments on how well the subjects heard the contents of the spoken words to examine Parameter (1). In the next step, with regard to Parameter (2), the visual naturalness of the talking-face animation and the smoothness of the motion of the talking mouth were evaluated on a scale of 5 points. Lastly, with regard to Parameter (3), talking-face animation in which the synchronization of the animation with sound was off by a fixed interval was shown to subjects to investigate the subjective perception of the synchronization gap, and the extent of the resulting strange feeling was evaluated on a scale of 5 points. In addition, the effect of the synchronization gap between voice and talking-face animation on the manner in which the spoken words are understood was also evaluated. Through these evaluation experiments, the quality of synthetic talking-face animation proposed by the authors was evaluated, and we studied naturally-appearing synchronization between synthetic talking-face animation and voice.
|Electronics and Communications in Japan, Part III: Fundamental Electronic Science (English translation of Denshi Tsushin Gakkai Ronbunshi)
|Published - 2006 5月
ASJC Scopus subject areas