Skip to content

Repository for "Predicting Adverse Pregnancy Outcomes with Machine Learning"

Notifications You must be signed in to change notification settings

sarahmcdougall/ml-pregnancy-outcomes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Adverse Pregnancy Outcomes with Machine Learning

Project Overview

This project aims to predict 3 adverse pregnancy outcomes - preeclampsia, preterm delivery, and obstretric hemorrhaging - using supervised machine learning models trained on health and demographic data.

This repository contains the Jupyter Notebooks, external resources, and constructed dataframes used to construct machine learning models for prediction of adverse pregnancy outcomes using the MIMIC-IV dataset.

Directory Structure

  • notebooks - contains Jupyter Notebooks used for preprocessing, EDA, and model construction. Note that the notebooks have numbers appended to the end of their file names. The numbers represent the general order in which the notebooks should be run. While not all notebooks are dependent on the results of another, many of the notebooks do depend on the CSV output of previous notebooks.
  • final_dfs - final dataframes used for model construction
  • resources - external resources used for data pre-processing, filtering, and mapping

Dataset and Sources

This project utilizes the open-source Multiparameter Intelligent Monitoring in Intensive Care (MIMIC)-IV dataset. The datset is a publicly available database sourced from the electronic health record (EHR) of the Beth Israel Deaconess Medical Center in Massachusetts.

The dataset contains data on a clinical cohort of patients that were admitted to the Emergency Department (ED) or an intensive care unit (ICU) between the years of 2008 and 2019. All patients are greater than 18 years of age and the patient records have been de-identified to abide by HIPAA regulations. The MIMIC-IV dataset takes on a relational structure and contains patient demographic data, health metrics, and mapping tables with the International Classification of Diseases (ICD) codes, Diagnosis Related Groups (DRGs), and the Healthcare Common Procedure Coding System (HCPCS). Screen Shot 2024-08-17 at 8 47 48 AM

Methodology

Data Extraction and Cleaning

Exploratory Data Analysis

  • explore adverse outcomes by race, marital status, and other demographic factors
  • explore common diagnoses and prescriptions for pregnant patients

Model Selection and Training

For all classifiers:

  • apply SMOTE to account for output imbalance
  • apply stratified 5-fold cross-validation to ensure minority classes are represented
  • evaluate via AUC and recall

Constructed models:

  • AdaBoost
  • Random Forest
  • Long Short-Term Memory Network (LSTM)

Results

  • Binary LSTM model (predicting adverse outcome v. no adverse outcome) achieved 88.5% AUC and 94% recall. binary_lstm_roc

  • Multi-label LSTM model (predicts output labels for each adverse outcome) achieved 77% AUC and 92% recall. multi_lstm_roc_updated png

  • Binary AdaBoost model achieved 86% recall and 88% precision. admissions_ada_binary_matrix

  • The models tend to over-predict adverse outcomes (Type I error) compared to missing a diagnosis (Type II error).

Relevant Resources

About

Repository for "Predicting Adverse Pregnancy Outcomes with Machine Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published