A Similarity Function for Feature Pattern Clustering and High Dimensional Text Document Classification

Foundations of Science 25 (4):1077-1094 (2020)
  Copy   BIBTEX

Abstract

Text document classification and clustering is an important learning task which fits to both data mining and machine learning areas. The learning task throws several challenges when it is required to process high dimensional text documents. Word distribution in text documents plays a very key role in learning process. Research related to high dimensional text document classification and clustering is usually limited to application of traditional distance functions and most of the research contributions in the existing literature did not consider the word distribution in documents. In this research, we propose a novel similarity function for feature pattern clustering and high dimensional text classification. The similarity function proposed is used to carry supervised learning based dimensionality reduction. The important feature of this work is that the word distribution before and after dimensionality reduction is the same. Experiment results prove the proposed approach achieves dimensionality reduction, retains the word distribution and obtained better classification accuracies compared to other measures.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,438

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Event Mining Through Clustering.T. V. Geetha & E. Umamaheswari - 2014 - Journal of Intelligent Systems 23 (1):59-73.
Innovative techniques for legal text retrieval.Marie-Francine Moens - 2001 - Artificial Intelligence and Law 9 (1):29-57.

Analytics

Added to PP
2019-03-09

Downloads
18 (#819,350)

6 months
3 (#987,746)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references