Web News Data Extraction Technology Based on Text Keywords

Complexity 2021:1-11 (2021)
  Copy   BIBTEX

Abstract

In order to shorten the time for users to query news on the Internet, this paper studies and designs a network news data extraction technology, which can obtain the main news information through the extraction of news text keywords. Firstly, the TF-IDF keyword extraction algorithm, TextRank keyword extraction algorithm, and LDA keyword extraction algorithm are analyzed to understand the keyword extraction process, and the TF-IDF algorithm is optimized by Zipf’s law. By introducing the idea of model fusion, five schemes based on waterfall fusion and parallel combination fusion are designed, and the effects of the five schemes are verified by experiments. It is found that the designed extraction technology has a good effect on network news data extraction. News keyword extraction has a great application prospect, which can provide the basis for the research fields of news key phrases, news abstracts, and so on.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,853

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Signal extraction: experimental evidence.Te Bao & John Duffy - 2020 - Theory and Decision 90 (2):219-232.
Reviewing measures of outcome: reliability of data extraction.K. L. Haywood, J. Hargreaves, R. White & S. E. Lamb - 2004 - Journal of Evaluation in Clinical Practice 10 (2):329-337.
What is fake news?M. R. X. Dentith - 2018 - University of Bucharest Review (2):24-34.

Analytics

Added to PP
2021-04-17

Downloads
11 (#1,137,570)

6 months
4 (#790,339)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Kun Zhang
Carnegie Mellon University

Citations of this work

No citations found.

Add more citations