Title: | Document layout analysis: A maximum homogeneous region approach |
Author(s): | Tran T.A. |
Keywords: | Document layout analysis; Homogeneous region; Ocr.; Page segmentation; Whitespace analysis |
Abstract: | This paper presents a method for document layout analysis. This method applies the analyzing of whitespace in maximum homogeneous regions. This method focuses on the balance between processing time and performance. It consists of two main stages: Classification and segmentation. Firstly, by using the analysis of whitespace analysis on Maximum multi-layer horizontal homogeneous regions, the text and non-text elements are classified. Then, text regions are extracted by using mathematical morphology. Besides, non-text elements are classified into separators, tables, images via a machine learning approach. The proposed method's effectiveness is proved by the tests on UW-III (A1) datasets. |
Issue Date: | 2018 |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
URI: | http://digital.lib.ueh.edu.vn/handle/UEH/62274 |
DOI: | https://doi.org/10.1109/MAPR.2018.8337515 |
ISBN: | 9781538641804 |
Appears in Collections: | Conference Papers
|