TY - GEN
T1 - Real-time meeting recognition and understanding using distant microphones and omni-directional camera
AU - Hori, Takaaki
AU - Araki, Shoko
AU - Yoshioka, Takuya
AU - Fujimoto, Masakiyo
AU - Watanabe, Shinji
AU - Oba, Takanobu
AU - Ogawa, Atsunori
AU - Otsuka, Kazuhiro
AU - Mikami, Dan
AU - Kinoshita, Keisuke
AU - Nakatani, Tomohiro
AU - Nakamura, Atsushi
AU - Yamato, Junji
PY - 2010
Y1 - 2010
N2 - This paper presents our newly developed real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to automatically recognize "who is speaking what" in an online manner for meeting assistance. Our system continuously captures the utterances and the face pose of each speaker using a distant microphone array and an omni-directional camera at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g. speaking, laughing, watching someone) and the situation of the meeting (e.g. topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.
AB - This paper presents our newly developed real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to automatically recognize "who is speaking what" in an online manner for meeting assistance. Our system continuously captures the utterances and the face pose of each speaker using a distant microphone array and an omni-directional camera at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g. speaking, laughing, watching someone) and the situation of the meeting (e.g. topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.
KW - Distant microphones
KW - Meeting analysis
KW - Speaker diarization
KW - Speech enhancement
KW - Speech recognition
KW - Topic tracking
UR - http://www.scopus.com/inward/record.url?scp=79951797950&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79951797950&partnerID=8YFLogxK
U2 - 10.1109/SLT.2010.5700890
DO - 10.1109/SLT.2010.5700890
M3 - Conference contribution
AN - SCOPUS:79951797950
SN - 9781424479030
T3 - 2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings
SP - 424
EP - 429
BT - 2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings
T2 - 2010 IEEE Workshop on Spoken Language Technology, SLT 2010
Y2 - 12 December 2010 through 15 December 2010
ER -