GitHub - knickhill/alzheimers-prediction: ML for early diagnosis of Alzheimer's disease. Results and jupyter notebook published on GitHub pages.

Alzheimer's Detection from ADNI data

This project was submitted to fulfill requirements of the Intro to Data Science (CS109A) course at the Harvard Graduate School of Engineering and Applied Sciences. Team members: Jonathan Fisher, Kezi Cheng, Nikhil Mallareddy

Objective

The question we want to answer is:

What is the minimal set of diagnostic predictors that can lower the overall cost of clinical testing without sacrificing a substantial degree of diagnostic accuracy?

Therefore, our goal is not to develop a machine learning algorithm that accurately predicts Alzheimer's disease risk, but to find out which clinical diagnostic tests have the most influence on final positive diagnosis.

Data set

For this analysis, we merged two data sets:

ADNIMERGE - Contains data about various clinical, genetic, and imaging biomarkers from participants in a longitudinal multi-center study.
UPENN-CSF - Contains information about concentration of Amyloid beta compounds in cerebro-spinal fluid (CSF), which is known to be strongly correlated with Alzheimer's disease risk.

Disease status is classified into three categories:

Cognitively normal (CN)
Mild cognitive impairment (MCI)
Alzheimer's disease (AD)

Distribution of patients across the three classes is shown below:

Both data sets are available (with access controls) at adni.loni.usc.edu

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf

Summary of Results

We trained 6 baseline models and chose the multinomial logistic regression model as the best performing one. An important part of our analysis was to compare the contribution of each predictor to model accuracy against its summary cost metric (which is an aggregate cost function combining costs in performing the test and cost incurred due to the stage at which disease is diagnosed). A key result is that genetic tests offer the optimal tradeoff between cost and accuracy of prediction.

Dependencies

Python libraries - NumPy, Pandas, SciPy, Scikit-learn, Statsmodels, Matplotlib, Seaborn

The complete project is published at https://knickhill.github.io/alzheimers-prediction/
The associated jupyter notebook can be found at https://knickhill.github.io/alzheimers-prediction/model.html

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
EDA		EDA
Report		Report
_support		_support
notebooks		notebooks
.DS_Store		.DS_Store
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
_config.yml		_config.yml
cartoon.png		cartoon.png
index.html		index.html
model.html		model.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alzheimer's Detection from ADNI data

Objective

Data set

Summary of Results

Dependencies

About

Releases

Packages

Languages

knickhill/alzheimers-prediction

Folders and files

Latest commit

History

Repository files navigation

Alzheimer's Detection from ADNI data

Objective

Data set

Summary of Results

Dependencies

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages