Textsummarization-nlp

This Project focuses on the creation of an NLP pipeline which summarises newspaper articles scarped from various sources. Any NLP Pipeline construction employs the steps - Data Acquisition, Text Preprocessing, Featured Engineering, Modelling and Model Evaluation.

Step 1: Data Acquisition

The Data Set is employed as a CSV file uploaded in a google drive. Imported data from the url and read as CSV using Pandas.

Step 2: Text Preprocessing

This step included processing the text - cleaning included conversion to smaller case, remove all special characters, punctuation.

Step 3: Featured Engineering

This step included the conversion of text data to numerical data, The method employed is TF-IDF algorithm. The sentences were tokenised into words and a frequency distribution made. Sentence scores were created which is later used in creating the summary. All of these were done as functions which could be later employed in modelling.

Step 4: Modelling

One paragraph from the specified row (indexed) is assigned to a variable "para". num_sent is preset to 5, this is a variable which can be varied depending on the number of sentences to beconsidered.. Text processing and TF-IDF calculations are made by passing the variable to the functions and summary is generated.

Step 5: Model Evaluation

The Original content and the no. of words in it is printed out initially. The summary and the no. of words in it is printed next. A comparison can be drawn as such. Model Evaluation Table : https://docs.google.com/spreadsheets/d/1FSRo0rsfbBtqZ2klduPULh2U6V7p6fRQDIYaKzF5VpQ/edit?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Assignment CRUV.py		Assignment CRUV.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Textsummarization-nlp

About

Releases

Packages

Languages

akhinageorge/Textsummarization-nlp

Folders and files

Latest commit

History

Repository files navigation

Textsummarization-nlp

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages