Guru, D. S. and Suhil, Mahamad (2015) A novel termclass relevance measure for text categorization. Procedia Computer Science, 45. pp. 13-22. ISSN 1877-0509
Full text not available from this repository. (Request a copy)Abstract
In this paper, we introduce a new measure called TermClass relevance to compute the relevancy of a term in classifying a document into a particular class. The proposed measure estimates the degree of relevance of a given term, in placing an unlabeled document to be a member of a known class, as a product of ClassTerm weight and ClassTerm density; where the ClassTerm weight is the ratio of the number of documents of the class containing the term to the total number of documents containing the term and the ClassTerm density is the relative density of occurrence of the term in the class to the total occurrence of the term in the entire population. Unlike the other existing term weighting schemes such as TF-IDF and its variants, the proposed relevance measure takes into account the degree of relative participation of the term across all documents of the class to the entire population. To demonstrate the significance of the proposed measure experimentation has been conducted on the 20 Newsgroups dataset. Further, the superiority of the novel measure is brought out through a comparative analysis.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Text Categorization and Term Weight and Term-Document Relevance and TermClass Relevance and Supervised Term Weighting and Unsupervised Term Weighting |
Subjects: | D Physical Science > Computer Science |
Divisions: | Department of > Computer Science |
Depositing User: | Users 19 not found. |
Date Deposited: | 20 Jul 2019 06:09 |
Last Modified: | 20 Jul 2019 06:09 |
URI: | http://eprints.uni-mysore.ac.in/id/eprint/5398 |
Actions (login required)
View Item |