TY - GEN
T1 - Document layout analysis and reading order determination for a reading robot
AU - Pan, Yucun
AU - Zhao, Qunfei
AU - Kamata, Seiichiro
PY - 2010/12/1
Y1 - 2010/12/1
N2 - In this paper an efficient approach of document layout analysis and reading order determination is proposed for a reading robot. Firstly the input document images are preprocessed to remove noises, connect lines and domains, and to reduce the computation time. Secondly a bottom-up, parameter-independent, two-step layout analysis algorithm based on morphology is used, which outlines the geometry of the maximum homogeneous regions and classifies them into texts, tables, and pictures. Finally the reading order is determined, by a top-down recursive hierarchy algorithm derived from XY-cut, using a set of rules depending on layout information. Important parameters are acquired using statistic information of the given images to adapt to different types of documents. The proposed algorithm is applied to a large number of document images and the experimental results show that it makes the reading robot be able to read paper documents of different languages, even with complex layout structure.
AB - In this paper an efficient approach of document layout analysis and reading order determination is proposed for a reading robot. Firstly the input document images are preprocessed to remove noises, connect lines and domains, and to reduce the computation time. Secondly a bottom-up, parameter-independent, two-step layout analysis algorithm based on morphology is used, which outlines the geometry of the maximum homogeneous regions and classifies them into texts, tables, and pictures. Finally the reading order is determined, by a top-down recursive hierarchy algorithm derived from XY-cut, using a set of rules depending on layout information. Important parameters are acquired using statistic information of the given images to adapt to different types of documents. The proposed algorithm is applied to a large number of document images and the experimental results show that it makes the reading robot be able to read paper documents of different languages, even with complex layout structure.
KW - A reading robot
KW - Adaptive
KW - Hierarchy
KW - Layout analysis
KW - Morphology based
KW - Reading order determination
UR - http://www.scopus.com/inward/record.url?scp=79951623521&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79951623521&partnerID=8YFLogxK
U2 - 10.1109/TENCON.2010.5686038
DO - 10.1109/TENCON.2010.5686038
M3 - Conference contribution
AN - SCOPUS:79951623521
SN - 9781424468904
T3 - IEEE Region 10 Annual International Conference, Proceedings/TENCON
SP - 1607
EP - 1612
BT - TENCON 2010 - 2010 IEEE Region 10 Conference
T2 - 2010 IEEE Region 10 Conference, TENCON 2010
Y2 - 21 November 2010 through 24 November 2010
ER -