A novel termclass relevance measure for text categorization

Guru, D. S. and Suhil, Mahamad (2015) A novel termclass relevance measure for text categorization. Procedia Computer Science, 45. pp. 13-22. ISSN 1877-0509

Full text not available from this repository. (Request a copy)

Abstract

In this paper, we introduce a new measure called TermClass relevance to compute the relevancy of a term in classifying a document into a particular class. The proposed measure estimates the degree of relevance of a given term, in placing an unlabeled document to be a member of a known class, as a product of ClassTerm weight and ClassTerm density; where the ClassTerm weight is the ratio of the number of documents of the class containing the term to the total number of documents containing the term and the ClassTerm density is the relative density of occurrence of the term in the class to the total occurrence of the term in the entire population. Unlike the other existing term weighting schemes such as TF-IDF and its variants, the proposed relevance measure takes into account the degree of relative participation of the term across all documents of the class to the entire population. To demonstrate the significance of the proposed measure experimentation has been conducted on the 20 Newsgroups dataset. Further, the superiority of the novel measure is brought out through a comparative analysis.

Item Type: Article
Uncontrolled Keywords: Text Categorization and Term Weight and Term-Document Relevance and TermClass Relevance and Supervised Term Weighting and Unsupervised Term Weighting
Subjects: D Physical Science > Computer Science
Divisions: Department of > Computer Science
Depositing User: Shrirekha N
Date Deposited: 20 Jul 2019 06:09
Last Modified: 20 Jul 2019 06:09
URI: http://eprints.uni-mysore.ac.in/id/eprint/5398

Actions (login required)

View Item View Item