Spotting of keyword directly in run-length compressed documents

Javed, Mohammed and Nagabhushan, P. and Chaudhuri, Bidyut Baran (2017) Spotting of keyword directly in run-length compressed documents. In: Proceedings of International Conference on Computer Vision and Image Processing.

Full text not available from this repository. (Request a copy)

Abstract

With the rapid growth of digital libraries, e-governance and Internet applications, huge volume of documents are being generated, communicated and archived in the compressed form to provide better storage and transfer efficiencies. In such a large repository of compressed documents, the frequently used operations like keyword searching and document retrieval have to be carried out after decompression and subsequently with the help of an OCR. Therefore developing keyword spotting technique directly in compressed documents is a potential and challenging research issue. In this backdrop, the paper presents a novel approach for searching keywords directly in run-length compressed documents without going through the stages of decompression and OCRing. The proposed method extracts simple and straightforward font size invariant features like number of run transitions and correlation of runs over the selected regions of test words, and matches with that of the user queried word. In the subsequent step, based on the matching score, the keywords are spotted in the compressed document. The idea of decompression-less and OCR-less word spotting directly in compressed documents is the major contribution of this paper. The method is experimented on a data set of compressed documents and the preliminary results obtained validate the proposed idea.

Item Type:	Conference or Workshop Item (Paper)
Subjects:	D Physical Science > Computer Science
Divisions:	Department of > Computer Science
Depositing User:	C Swapna Library Assistant
Date Deposited:	07 Mar 2020 06:07
Last Modified:	11 Mar 2020 05:57
URI:	http://eprints.uni-mysore.ac.in/id/eprint/11498

Actions (login required)

View Item