Skip to content

NLP Pipeline to read a newspaper article and summarize the content using TF-IDF

Notifications You must be signed in to change notification settings

akhinageorge/Textsummarization-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Textsummarization-nlp

This Project focuses on the creation of an NLP pipeline which summarises newspaper articles scarped from various sources. Any NLP Pipeline construction employs the steps - Data Acquisition, Text Preprocessing, Featured Engineering, Modelling and Model Evaluation.

Step 1: Data Acquisition

The Data Set is employed as a CSV file uploaded in a google drive. Imported data from the url and read as CSV using Pandas.

Step 2: Text Preprocessing

This step included processing the text - cleaning included conversion to smaller case, remove all special characters, punctuation.

Step 3: Featured Engineering

This step included the conversion of text data to numerical data, The method employed is TF-IDF algorithm. The sentences were tokenised into words and a frequency distribution made. Sentence scores were created which is later used in creating the summary. All of these were done as functions which could be later employed in modelling.

Step 4: Modelling

One paragraph from the specified row (indexed) is assigned to a variable "para". num_sent is preset to 5, this is a variable which can be varied depending on the number of sentences to beconsidered.. Text processing and TF-IDF calculations are made by passing the variable to the functions and summary is generated.

Step 5: Model Evaluation

The Original content and the no. of words in it is printed out initially. The summary and the no. of words in it is printed next. A comparison can be drawn as such. Model Evaluation Table : https://docs.google.com/spreadsheets/d/1FSRo0rsfbBtqZ2klduPULh2U6V7p6fRQDIYaKzF5VpQ/edit?usp=sharing

About

NLP Pipeline to read a newspaper article and summarize the content using TF-IDF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages