GitHub - mostafa-fallaha/kardia: Kardía is a project designed to analyze the factors that contribute to heart attacks and predict the likelihood of someone experiencing one.

ETL: A streamlined pipeline for extracting, transforming, and loading health-related data.
Analysis: Power BI, time Series and Network analysis was used to analyze key factors contributing to heart attacks.
Prediction App: A machine learning Streamlit web application that uses a Random Forest Classifier to predict heart attack risk with 98.7% accuracy, based on user input.

User Stories

As a user, I want to input my personal health data, so I can receive a prediction on my likelihood of having a heart attack.
As a user, I want to view the accuracy of the heart attack prediction model, so I can trust the results I'm given.
As a user, I want to explore a Power BI report that visualizes heart attack data, so I can see how different health factors correlate with heart attacks.
As a user, I want to filter the Power BI report by age, gender, and other health conditions, so I can focus on data relevant to me or my demographic.

Kardía is built using the following technologies:

Python: This project utilizes Python for creating the ETL (Extract, Transform, Load) pipeline, enabling efficient data handling and preprocessing.
Streamlit: Streamlit is used to create the user interface for the machine learning model. It provides an interactive platform where users can input their data and receive heart attack predictions in real-time, making the model easy to use and accessible.
MySQL: MySQL is used to design and manage the schema in the database, enabling organized, scalable, and efficient data storage.
DVC (Data Version Control): DVC is employed to version the data, ensuring that every change in the dataset is tracked and reproducible. This is especially important in projects dealing with evolving data sources.
Random Forest Classifier: The machine learning model at the heart of this project is built using Python. The Random Forest algorithm is chosen for its effectiveness in handling binary classification tasks, like predicting the likelihood of a heart attack.
MLflow: For model versioning, MLflow is used to manage the lifecycle of the Random Forest model, including tracking experiments, packaging code into reproducible runs, and deploying models.
Power BI: Power BI is used to create interactive visualizations and dashboards that provide insights into heart attack trends, enabling data analysis and reporting for better understanding and decision-making.
PowerShell Scripts: To streamline and automate repetitive tasks such as running specific Python scripts or managing data workflows.
Windows Task Scheduler to scheculde a Batch script for the ETL process.

Deployment on Streamlit Community Cloud

1. Model Storage and Management:

The machine learning model is stored in Google Drive. When the app starts, it checks if the model is already available locally. If not, it downloads the model from Google Drive using gdown. The model is then loaded into the app using joblib.

2. Model Caching:

To ensure efficient use of resources, the model is cached using Streamlit’s @st.cache_resource decorator. This helps reduce the memory load and avoid redundant downloads, especially when the app is reopened.

3. Streamlit App:

The app is designed with an intuitive interface that allows users to interact with the machine learning model. Streamlit simplifies deployment by automatically handling scalability and hosting, while the app remains responsive and user-friendly.

You can access the Streamlit app from here.

User Screens (Streamlit Web App)

Home screen	Prediction Screen

User Screens (Power BI report)

Overview Screen	Line Chart Screen

Personal Analysis Screen	Scatter Screen

Disease Analysis Screen	Decomposition Tree Screen

This project employs a validation methodology to ensure the reliability and accuracy data loading. Which helps in identifying and addressing potential issues early in the development process.

The Validations are logged to a logs table in the database.

Logs

you can see also the data versioning with DVC.

To set up Kardía locally, follow these steps:

Prerequisites

Python: i prefer downloading Miniconda. Miniconda offers several advantages over a standalone Python installation, especially for data science and scientific computing tasks.
You can see how to install it here.
MySQL and MySQL Workbench: Download them here.
Also in this list, download the MySQL connector. If you're on windows, download this one Connector/NET
Power BI: it works on Windows only, and you can download it from the Microsoft Store or from here.

Installation

clone the repo

git clone https://github.com/mostafa-fallaha/heart-disease-prediction.git
cd heart-disease-prediction

install the required Python packages

pip instal -r requirements.txt

create the DVC storage

mkdir /tmp/dvc_heart

Download the parquet file from here. And put it in ETL/docs.
create the logs table, you can find the SQL scripts for it in ETL/dwh/logs_table.sql.
create a .env file in the root of the project containing the following:

DB_USER=your database username
DB_PASSWORD=your database password
DB_HOST=your host (usually localhost)
DB_PORT=the port where mysql is running (usually 3306)
LOGS_DB=the database where your logs table is.
DB_STAGING=the staging schema name (create the schema in mysql workbench, no need to create any table)
DB_DWH=the DWH schema name (you need to create tables, in the step 3)
VERSION=0.9 (this to increment the data version whenever you run the ETL process)

run the extract.py in the ETL folder to load to the staging schema.
in mysql workbench, create a new schema (the DWH schema) and put the name in the .env file (here DB_DWH). And then run the final_dwh.sql (you can find it in ETL/dwh) in the newly created schema to create the tables and the relations.
run the transform.py in the ETL folder to transform the data and load it to the DWH and to version the data via DVC.
run, train and version (via MLflow) the machine learning model that reads the data from DVC via the DVC python API.

cd DataScience
python3 model_versioning.py

run the mlflow ui: cd to the root directory

cd ..
mlflow ui

this will take the whole terminal.

run the streamlit app: open a new terminal in the project directory.

cd DataScience
streamlit run app.py

Now, you should be able to run the Streamlit app locally and explore its features.

to access the Power BI report, you can download it from here.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.dvc		.dvc
DataScience		DataScience
ETL		ETL
readme		readme
.dvcignore		.dvcignore
.gitignore		.gitignore
README.md		README.md
app2.py		app2.py
pull_by_version.py		pull_by_version.py
pull_version.ps1		pull_version.ps1
requirements.txt		requirements.txt
run_versioning.ps1		run_versioning.ps1
version_data.py		version_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Kardía is a project designed to analyze the factors that contribute to heart attacks and predict the likelihood of someone experiencing one.

User Stories

Kardía is built using the following technologies:

Deployment on Streamlit Community Cloud

1. Model Storage and Management:

2. Model Caching:

3. Streamlit App:

User Screens (Streamlit Web App)

User Screens (Power BI report)

This project employs a validation methodology to ensure the reliability and accuracy data loading. Which helps in identifying and addressing potential issues early in the development process.

The Validations are logged to a logs table in the database.

Prerequisites

Installation

About

Releases

Packages

Languages

mostafa-fallaha/kardia

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Kardía is a project designed to analyze the factors that contribute to heart attacks and predict the likelihood of someone experiencing one.

User Stories

Kardía is built using the following technologies:

Deployment on Streamlit Community Cloud

1. Model Storage and Management:

2. Model Caching:

3. Streamlit App:

User Screens (Streamlit Web App)

User Screens (Power BI report)

This project employs a validation methodology to ensure the reliability and accuracy data loading. Which helps in identifying and addressing potential issues early in the development process.

The Validations are logged to a logs table in the database.

Prerequisites

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages