TY - GEN
T1 - Automatic indexing of multimedia content by integration of audio, spoken language, and visual information
AU - Ohtsuki, Katsutoshi
AU - Bessho, Katsuji
AU - Matsuo, Yoshihiro
AU - Matsunaga, Shoichi
AU - Hayashi, Yoshihiko
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - This paper describes an automatic multimedia content indexing system that includes acoustic segmentation, automatic speech recognition, topic segmentation, and video indexing features. The system is intended for indexing of multimedia news programs. Speech segments extracted from news content are delivered to the speech recognition module. The speech recognition result is segmented into topics using a segmentation algorithm based on word conceptual vectors. The indexing results derived from audio and speech information are integrated with video indexing results to extract the story structure. Experimental results show that topic segmentation using word conceptual vectors is superior to the conventional method using local word co-occurrence frequencies, and that the integrated segmentation provides better news story structures than would be possible with any single type of information.
AB - This paper describes an automatic multimedia content indexing system that includes acoustic segmentation, automatic speech recognition, topic segmentation, and video indexing features. The system is intended for indexing of multimedia news programs. Speech segments extracted from news content are delivered to the speech recognition module. The speech recognition result is segmented into topics using a segmentation algorithm based on word conceptual vectors. The indexing results derived from audio and speech information are integrated with video indexing results to extract the story structure. Experimental results show that topic segmentation using word conceptual vectors is superior to the conventional method using local word co-occurrence frequencies, and that the integrated segmentation provides better news story structures than would be possible with any single type of information.
UR - http://www.scopus.com/inward/record.url?scp=33646820226&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33646820226&partnerID=8YFLogxK
U2 - 10.1109/ASRU.2003.1318508
DO - 10.1109/ASRU.2003.1318508
M3 - Conference contribution
AN - SCOPUS:33646820226
T3 - 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
SP - 601
EP - 606
BT - 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
Y2 - 30 November 2003 through 4 December 2003
ER -