Fast title extraction method for business documents

Yutaka Katsuyama*, Satoshi Naoi

*この研究の対応する著者

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

Conventional electronic document filing systems are inconvenient because the user must specify the keywords in each document for later searches. To solve this problem, automatic keyword extraction methods using natural language processing and character recognition have been developed. However, these methods are slow, especially for japanese documents. To develop a practical electronic document filing system, we focused on the extraction of keyword areas from a document by image processing. Our fast title extraction method can automatically extract titles as keywords from business documents. All character strings are evaluated for similarity by rating points associated with title similarity. We classified these points as four items: character sitting size, position of character strings, relative position among character strings, and string attribution. Finally, the character string that has the highest rating is selected as the title area. The character recognition process is carried out on the selected area. It is fast because this process must recognize a small number of patterns in the restricted area only, and not throughout the entire document. The mean performance of this method is an accuracy of about 91 percent and a 1.8 sec. processing time for an examination of 100 Japanese business documents.

本文言語English
ホスト出版物のタイトルProceedings of SPIE - The International Society for Optical Engineering
出版社Society of Photo-Optical Instrumentation Engineers
ページ192-201
ページ数10
ISBN(印刷版)0819424382
出版ステータスPublished - 1997
外部発表はい
イベントDocument Recognition IV - San Jose, CA, USA
継続期間: 1997 2月 121997 2月 13

出版物シリーズ

名前Proceedings of SPIE - The International Society for Optical Engineering
3027
ISSN(印刷版)0277-786X

Conference

ConferenceDocument Recognition IV
CitySan Jose, CA, USA
Period97/2/1297/2/13

ASJC Scopus subject areas

  • 電子材料、光学材料、および磁性材料
  • 凝縮系物理学
  • コンピュータ サイエンスの応用
  • 応用数学
  • 電子工学および電気工学

フィンガープリント

「Fast title extraction method for business documents」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル