Vaccine Prediction Model

The project is made under the DataHack hackathon by IIT Guwahati.

Vaccine Prediction Model

This repository contains the code for predicting the likelihood of individuals getting vaccinated against XYZ and seasonal influenza. The prediction is performed using machine learning models, primarily logistic regression, after extensive data preprocessing and feature engineering.

Dataset

The dataset consists of two main files:

training_set_features.csv: Contains features (independent variables) used for training.
training_set_labels.csv: Contains labels (dependent variables - vaccination status) for training.

Project Overview

1. Exploratory Data Analysis (EDA)

Loading and Merging Data: The features and labels are loaded and merged into a single DataFrame for easier manipulation.
Initial Examination: The data is examined to understand its structure, including the number of rows, columns, data types, and summary statistics.
Correlation Analysis: Correlation between features and target variables (xyz_vaccine and seasonal_vaccine) is calculated and visualized to understand relationships.

2. Data Preprocessing

Handling Missing Values: Missing values are identified and columns with more than 20% missing data are removed.
Feature Separation: Numerical and categorical features are separated for individual preprocessing.
Imputation and Encoding:
- Numerical features are imputed using the median strategy.
- Categorical features are imputed using the most frequent strategy and then one-hot encoded.
Feature Engineering: New features are created by combining existing ones to reduce dimensionality and improve model performance. For example:
- behavioral_precautions: Sum of various behavioral features.
- household_members: Sum of household children and adults.

3. Model Training and Evaluation

Model Selection: Three models are evaluated for their performance:
- Logistic Regression
- Random Forest Classifier
- Support Vector Machines (SVM)
Train-Test Split: The data is split into training and testing sets (80%-20% split).
Model Training: Each model is trained on the training data.
Performance Evaluation:
- Models are evaluated using the ROC-AUC score.
- ROC curves are plotted for visual comparison.
- Confusion matrix and classification report are generated to understand model performance in detail.

4. Hyperparameter Tuning

Randomized Search: Hyperparameters of the Logistic Regression model are tuned using RandomizedSearchCV to find the best parameters.
Final Model Training: The Logistic Regression model with the best parameters is retrained on the entire training dataset.

5. Prediction and Submission

Preprocessing Test Data: The test dataset is preprocessed using the same steps as the training data.
Prediction: Probabilities of vaccination for the test data are predicted using the trained models.
Submission File: A submission file (final_csv) is generated containing respondent IDs and predicted probabilities.

Files Included

README.md: Overview of the project and instructions.
dataset and all/: Directory containing training and test datasets.
vaccine_prediction_dishant_dothra.ipynb: Jupyter notebook containing the complete code for data preprocessing, model training, and prediction.
final_csv: Submission file containing predicted probabilities for the test dataset.

Requirements

Python 3.x
Libraries:
- numpy
- pandas
- seaborn
- matplotlib
- scikit-learn

Precaution

Rerunning the program might cause the parameters for best_model may change so according to changed parameter we can tune the model.

Usage

Clone the repository:

git clone https://github.com/yourusername/vaccine-prediction.git
cd vaccine-prediction

Install dependencies (if not already installed):
```
pip install -r requirements.txt
```
Run the Jupyter notebook vaccine_prediction_dishant_bothra.ipynb to execute the code step-by-step or view the results.
Modify parameters or models as needed for further experimentation.

Authors

Dishant Bothra

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vaccine Prediction Model

Dataset

Project Overview

1. Exploratory Data Analysis (EDA)

2. Data Preprocessing

3. Model Training and Evaluation

4. Hyperparameter Tuning

5. Prediction and Submission

Files Included

Requirements

Precaution

Usage

Authors

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
dataset and all		dataset and all
README.md		README.md
final_csv		final_csv
vaccine_prediction_dishant_bothra.ipynb		vaccine_prediction_dishant_bothra.ipynb

DishantB0411/Dishant_Bothra_Datahack

Folders and files

Latest commit

History

Repository files navigation

Vaccine Prediction Model

Dataset

Project Overview

1. Exploratory Data Analysis (EDA)

2. Data Preprocessing

3. Model Training and Evaluation

4. Hyperparameter Tuning

5. Prediction and Submission

Files Included

Requirements

Precaution

Usage

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages