Automatic metadata generation and video editing based on speech and image recognition for medical education contents

Satoshi Tamura*, Koji Hashimoto, Zhu Jiong, Satoru Hayamizu, Hirotsugu Asai, Hideki Tanahashi, Makoto Kanagawa

*この研究の対応する著者

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

This paper reports a metadata generation system as well as an automatic video edit system. The metadata are information described about the other data. In the audio metadata generation system, speech recognition using general language model (LM) and specialized LM is performed to input speech in order to obtain segment (event group) and audio metadata (event information) respectively. In the video edit system, visual metadata obtained by image recognition and audio metadata are combined into audio-visual metadata. Subsequently, multiple videos are edited to one video using the audio-visual metadata. Experiments were conducted to evaluate event detection of the systems using medical education contents, ACLS and BLS. The audio metadata system achieved about a 78% event detection correctness. In the edit system, an 87% event correctness was obtained by audio-visual metadata, and the survey proved that the edited video is appropriate and useful.

本文言語English
ホスト出版物のタイトルINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
出版社International Speech Communication Association
ページ2466-2469
ページ数4
ISBN(印刷版)9781604234497
出版ステータスPublished - 2006
外部発表はい
イベントINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
継続期間: 2006 9月 172006 9月 21

出版物シリーズ

名前INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
5

Conference

ConferenceINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
国/地域United States
CityPittsburgh, PA
Period06/9/1706/9/21

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)

フィンガープリント

「Automatic metadata generation and video editing based on speech and image recognition for medical education contents」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル