Skip to content

hpar1/news_category_classifier

Repository files navigation

CS4650 Project- News Category Classifier

By Hamza Parekh, Carlos Olea, Matthew Li

Goal: The goal of the project was to build a Neural Network Model that could predict the category for a given article.

###################################################################

Main Files:

superConvolution.py - Optimal Model with CNN LSTM and Embedding -~50 seconds/epoch

FinalData- zipped file containing all data - data set split into training and testing folder to be utilized by the model, then split into their individual categories

oneArticlePredict.py - Demo use predicting takes a file input and gives prediction

superConvolutionPredict.py - data must be set up beforehand similar as classifier inorder to test the accuracy of our data, the dataset used for predict should not be used in training (there is a similar one for old model)

dataBalance.py - padding the data, utilized in prepping data for model

encodeWords.py - maps words to a value utilized in prepping data for model

Dictionary.txt - generated by encodeWords, is a dictionary of words and their values ###################################################################

Web Scraper Folder:

CutTo400words.py - python code to cut off too lengthy articles to 400 words

NYTimesCategoryScraper.py - python code to navigate NYT and grab categories, in order to utilize please install selenium web driver at http://chromedriver.chromium.org/downloads

NewYorkTimesScraper.py - python code to scrape the article from NYT

TimesNewsfirst400Words.py - python code to navigate and scrape Times

###################################################################

Additional info:

CS4650 Final Presentation.pptx - presentation

cs4650projecthistory.xlsx - log of accuracy and loss of the 3 models

###################################################################

Old Models:

spicyArticleClasifier.py- Added CNN layers to model -~15seconds/epoch

lameArticleClassifier.py- base model with only LSTM -~30seconds/epoch

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages