Machine Learning Algorithms applied to The Movies Dataset to determine success factors.
Link to kaggle dataset: https://www.kaggle.com/rounakbanik/the-movies-dataset
Usage:
Data_Cleaning_PreProcessing.ipynb - the file which explores the data, cleans up, deals with JSON column for Genres, deals with Imbalanced classes problem, normalize the numerical columns (min-max scaling)
DimensionalityReductionPCA.ipynb - Apply PCA, selectKBest, selectPercentile
Gridsearch #4 XXX - Apply Gridsearch for each of the classifiers, find the best params for normal data and data with feature selection and/or dimensionality reduction.
The npy files are used to save data after pre-processing, and loading them subsequently.