Almost everyone has health insurance and usually takes one health checkup yearly with our primary healthcare provider. Just by utilizing this data, a lot can be predicted about our disease risks, and predicting obesity risk is one of the easiest and most useful in terms of prevention of other diseases related to obesity and its inherent symptoms
Showcases the various machine learning models for the prediction of multiclass labels. The various models compared are :
- Logistic Regression (one v rest method)
- Decision Tree
- Random Forest
- Support Vector Machine
- Neural Network
This project uses the data set from the Kaggle playground competition series found here.
This project uses jupyter notebook to run the file programmed in python. The installation lines for the libraries which are needed with the above two:
- pip install numpy
- pip install pandas
- pip install matplotlib
- pip install seaborn
- pip install scikit-learn
- pip install tensorflow
- Download the train.csv from the dataset source
- Run through the cells in the notebook in their order
- Experimentation can be done with feature selection using the correlation matrix and that data can be used to train other subsequent models if required
- Finally plot the comparison_model data frame to visualize the comparison of the different models' accuracy
The basic machine learning foundations taught in the course 'Python for Data Science and Machine Learning Bootcamp' by Jose Portilla on Udemy gave me concrete knowledge to implement ML models. And Kaggle for providing an extensive dataset to use for multiclass prediction. I would also like to give credit to my brother Sanjay Prabhakar for providing the necessary pointer to successfully complete my 1st Machine Learning Project.