Real-time facial action image synthesis system driven by speech and text

Shigeo Morishima*, Kiyoharu Aizawa, Hiroshi Harashima

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)


Automatic facial motion image synthesis schemes and a real-time system design are presented. The purpose of this scheme is to realize an intelligent human-machine interface or intelligent communication system with talking head images. A human face is reconstructed with 3D surface model and texture mapping technique on the display of terminal. Facial motion images are synthesized naturally by transformation of the lattice points on wire frames. Two types of motion drive methods, text to image conversion and speech to image conversion are proposed in this paper. In the former manner, the synthesized head can speak some given texts naturally and in the latter case, some mouth and jaw motions can be synthesized in time to speech signal of behind speaker. These schemes were implemented to a parallel image computer and a real-time image synthesizer could output facial motion images to the display as fast as video rate.

Original languageEnglish
Title of host publicationProceedings of SPIE - The International Society for Optical Engineering
EditorsMurat Kunt
PublisherPubl by Int Soc for Optical Engineering
Number of pages8
ISBN (Print)0819404217
Publication statusPublished - 1990
Externally publishedYes
EventVisual Communications and Image Processing '90 - Lausanne, Switz
Duration: 1990 Oct 11990 Oct 4

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume1360 pt 2
ISSN (Print)0277-786X


OtherVisual Communications and Image Processing '90
CityLausanne, Switz

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Real-time facial action image synthesis system driven by speech and text'. Together they form a unique fingerprint.

Cite this