In this project, i applied different data engineering techniques using Figure Eight to build a model for an API that classifies disaster messages.
This project will include a Flask app which takes the input message and classify it into different categories. It helps to the people who don't want to read entire text messages in the emergency situations.
Just copy the text message and paste it into the textbox. The app will classify the text messages into categories.
data/process_data.py clean and transform the text for Multioutput Classification
Steps:
- Loads the messages.csv and categories.csv
- Merge and Clean the data
- Stores the merged and cleaned dataframe to SQlite Database
models/train_classifier contains ML pipeline that:
- Loads stored data
- Split it into train and test set
- process the text with tex_tokenize.py file
- Trains the tuned model which is tuned in ML Pipeline Preparation
- Shows the Accuracy, Precision, Recall and F1 Scores for each category
- Saves the model to pickle file
Using Flask framework, the app has deployed to the Heroku you can check deployed app
- You can run the web app using run.py
Text message area Results area
The app is classifies the text message into categories.
-Codes are written in python versions 3.* and check the requirements.txt for project.
-
Run the process_data.py: This python file provides a clean data from the disaster_categories.csv and disaster messages.csv to DisasterResponse.db
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- data/process_data.py: Clean the data
- data/disaster_messages.csv: Get the messages data
- data/disaster_categories.csv: Get the categories of messages
- data/DisasterResponse.db: Database for storing the processed data
-
Run the train_classifier: Get the data from db and create ML Pipeline and save the model
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- models/train_classifier.py: Train the model with processed data
- data/DisasterResponse.db: Stored Data which is provided from process_data.py
- models/classifier.pkl: Path of the trained model
-
Run the run.py
After clean the data and trained in train_classifier. You can run the run.py for web app
I provided 3 graph according to training data:
- The messages are related to 3 types of genres Here is the distribution:
- First 3 Categories are: 1- related, 2- aid_related, 3- weather_related
- This data is imbalanced in most of categories so that all categories prediction accuracies are around the 90+ but Recall and Precision scores are very low. I weighted for balance the attributes in Random Forest Classifier so that ones and zeros does not have equal importance for our case
- Here is you can see the distribution of messages lengths distribution based on Genre. Most of messages are less than 400 letters
|-- Notebooks-----------------#Notebook files for data process and model training
|-- README.md
|-- app
| |-- run.py------------------# Flask app
| |-- static------------------# Github and Linkedin Logos
| |-- templates
| | |-- go.html-------------# Results page
| | `-- master.html---------# Main page
| `-- text_tokenize.py--------# Text tokenizer
|-- data
| |-- DisasterResponse.db-----# Stored Data
| |-- disaster_categories.csv-# Categories.csv
| |-- disaster_messages.csv---# Messages.csv
| `-- process_data.py---------# Data processor
|-- img-------------------------# readme images
|-- models
| |-- classifier.pkl----------# Trained model
| |-- text_tokenize.py--------# Text tokenizer
| `-- train_classifier.py-----# Classifier
`-- requirements.txt------------# Required Python Libraries
All project written in Python 3.8 and requirements.txt file shows the necessary libraries