TY - JOUR
T1 - Man-machine interaction using a vision system with dual viewing angles
AU - Huang, Ying Jieh
AU - Dohi, Hiroshi
AU - Ishizuka, Mitsurti
PY - 1997
Y1 - 1997
N2 - This paper describes a vision system with dual viewing angles, i.e., wide and narrow viewing angles, and a scheme of user-friendly speech dialogue environment based on the vision system. The wide viewing angle provides a wide viewing field for wide range motion tracking, and the narrow viewing angle is capable of following a target in wide viewing field to take the image of the target with sufficient resolution. For a fast and robust motion tracking, modified motion energy (MME) and existence energy (££) arc defined to detect the motion of the target and extract the motion region at the same time. Instead of using a physical device such as a foot switch commonly used in speech dialogue systems, the begin/end of an utterance is detected from the movement of user's mouth in our system. Without recognizing the movement of lips directly, the shape variation of the region between lips is tracked for more stable recognition of the span of a dialogue. The tracking speed is about 10 frames/sec when no recognition is performed and about 5 frames/sec when both tracking and recognition are performed without using any special hardware.
AB - This paper describes a vision system with dual viewing angles, i.e., wide and narrow viewing angles, and a scheme of user-friendly speech dialogue environment based on the vision system. The wide viewing angle provides a wide viewing field for wide range motion tracking, and the narrow viewing angle is capable of following a target in wide viewing field to take the image of the target with sufficient resolution. For a fast and robust motion tracking, modified motion energy (MME) and existence energy (££) arc defined to detect the motion of the target and extract the motion region at the same time. Instead of using a physical device such as a foot switch commonly used in speech dialogue systems, the begin/end of an utterance is detected from the movement of user's mouth in our system. Without recognizing the movement of lips directly, the shape variation of the region between lips is tracked for more stable recognition of the span of a dialogue. The tracking speed is about 10 frames/sec when no recognition is performed and about 5 frames/sec when both tracking and recognition are performed without using any special hardware.
KW - Dual viewing tingles
KW - Month pattern recognition
KW - Motion tracking
KW - Speech dialogue system
KW - Vision system
UR - http://www.scopus.com/inward/record.url?scp=0031270867&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031270867&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:0031270867
SN - 0916-8532
VL - E80-D
SP - 1074
EP - 1083
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 11
ER -