Document layout analysis and reading order determination for a reading robot

Yucun Pan*, Qunfei Zhao, Seiichiro Kamata

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)


In this paper an efficient approach of document layout analysis and reading order determination is proposed for a reading robot. Firstly the input document images are preprocessed to remove noises, connect lines and domains, and to reduce the computation time. Secondly a bottom-up, parameter-independent, two-step layout analysis algorithm based on morphology is used, which outlines the geometry of the maximum homogeneous regions and classifies them into texts, tables, and pictures. Finally the reading order is determined, by a top-down recursive hierarchy algorithm derived from XY-cut, using a set of rules depending on layout information. Important parameters are acquired using statistic information of the given images to adapt to different types of documents. The proposed algorithm is applied to a large number of document images and the experimental results show that it makes the reading robot be able to read paper documents of different languages, even with complex layout structure.

Original languageEnglish
Title of host publicationTENCON 2010 - 2010 IEEE Region 10 Conference
Number of pages6
Publication statusPublished - 2010 Dec 1
Event2010 IEEE Region 10 Conference, TENCON 2010 - Fukuoka, Japan
Duration: 2010 Nov 212010 Nov 24

Publication series

NameIEEE Region 10 Annual International Conference, Proceedings/TENCON


Other2010 IEEE Region 10 Conference, TENCON 2010


  • A reading robot
  • Adaptive
  • Hierarchy
  • Layout analysis
  • Morphology based
  • Reading order determination

ASJC Scopus subject areas

  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Document layout analysis and reading order determination for a reading robot'. Together they form a unique fingerprint.

Cite this