Deep Text Mining for Automatic Keyphrase Extraction from Text Documents

Journal of Intelligent Systems 20 (4):327-351 (2011)
  Copy   BIBTEX

Abstract

Due to existence of a huge amount of textual data either on the World Wide Web or in textual databases like PubMed, the development of novel automatic keyphrase extraction methods has emerged as one of the key research problems in recent past. Consequently, a number of machine learning techniques, mostly supervised, have been proposed to extract keyphrases from text documents. But, one of the main bottlenecks that hinders the success of such systems is the requirement of annotated corpora for training purpose. In this paper, we propose the design of a deep text mining system to identify keyphrases in text documents that are either unstructured or semi-structured in nature. The novelty of our system lies in its applicability on a single document, instead of demanding a collection of annotated texts for training, to identify keyphrases embedded within it. The proposed system applies parsing techniques to identify candidate phrases. After mapping the original set of candidate phrases into a low-dimensional space using Singular Value Decomposition, the Markov Clustering technique is applied to cluster related sentences together. Finally, considering each cluster as a document, Latent Dirichlet Allocation is applied to identify feasible keyphrases that are presented to users in non-increasing order of their relevance score values. The efficacy of the proposed system is established through experimentation on datasets from two different domains. On comparative evaluation, we found that the proposed system outperforms KEA and KEA that apply the supervised machine learning approach for automatic keyphrase extraction from text documents.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 93,932

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Event Mining Through Clustering.T. V. Geetha & E. Umamaheswari - 2014 - Journal of Intelligent Systems 23 (1):59-73.

Analytics

Added to PP
2017-01-12

Downloads
12 (#1,093,652)

6 months
3 (#1,207,210)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references