Automatic removal of handwritten annotations from between-text-lines and inside-text-line regions of a printed text document

Nagabhushan, P. and Rachida, H. and Elboushaki, Abdessamad and Javed, Mohammed (2015) Automatic removal of handwritten annotations from between-text-lines and inside-text-line regions of a printed text document. Procedia Computer Science, 45. pp. 205-214. ISSN 1877-0509

Full text not available from this repository. (Request a copy)


Recovering the original printed text document from handwritten annotations, and making it machine readable is still one of the challenging problems in document image analysis, especially when the original document is unavailable. Therefore, our overall aim of this research is to detect and remove any handwritten annotations that may appear in any part of the document, without causing any loss of original printed information. In this paper, we propose two novel methods to remove handwritten annotations that are specifically located in between-text-lines and inside-text-line regions. To remove between-text-line annotations, a two stage algorithm is proposed, which detects the base line of the printed text lines using the analysis of connected components and removes the annotations with the help of statistically computed distance between the text line regions. On the other hand, to remove the inside-text-line annotations, a novel idea of distinguishing between handwritten annotations and machine printed text is proposed, which involves the extraction of three features for the connected components merged at word level from every detected printed text line. As a first distinguishing feature, we compute the density distribution using vertical projection profile; then in the subsequent step, we compute the number of large vertical edges and the major vertical edge as the second and third distinguishing features employing Prewitt edge detection technique. The proposed method is experimented with a dataset of 170 documents having complex handwritten annotations, which results in an overall accuracy of 93.49 in removing handwritten annotations and an accuracy of 96.22 in recovering the original printed text document.

Item Type: Article
Uncontrolled Keywords: Handwritten Annotation Removal and Marginal Annotation Removal and Between-Text-Line Annotations and Inside-Text-Line Annotation
Subjects: D Physical Science > Computer Science
Divisions: Department of > Computer Science
Depositing User: Users 19 not found.
Date Deposited: 20 Jul 2019 09:50
Last Modified: 20 Jul 2019 09:50

Actions (login required)

View Item View Item