Skip to content
/ canAI Public
forked from u-brite/canAI

Comparing feature extraction methods for biomarker discovery in a prostate-cancer study.

License

Notifications You must be signed in to change notification settings

tkmamidi/canAI

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

canAI

Aim: Comparing feature extraction methods for biomarker discovery in a prostate-cancer study.

Table of Contents

Background

The availability of large -omics data sets holds promise for identifying biomarkers that can impact cancer diagnosis, risk stratification, and treatment as well as an understanding of disease pathophysiology. Given the large, high dimensional data sets generated by these methods, machine learning tools hold promise to help extract features and integrate -omic data sets so that useful biomarkers can be identified. Prostate cancer is the second most common cancer in the US with an estimated 34,500 deaths in 2022. In this study we will use transcriptome and methylation data from TCGA and integrate it with phenotypic data using a set of machine learning models. We will then use this integrated data to look for biomarkers which correlate with outcome measures such as disease recurrence, metastasis, and rise in PSA level. A web facing app to visualize the data will be developed to allow exploration of the data by clinicians and researchers without extensive data science training.

Workflow

Untitled Diagram drawio

Data

The Cancer Genome Atlas (TCGA) is one repository of cancer data which is available and has both clinical (phenotypic) data and genomic, epigenetic, and transcriptomic data on prostate cancer.

Usage

canAI can be accessed as a streamlit site to look at features resulting from Tumor vs Normal samples.

Installation

Installation simply requires fetching the source code. Following are required:

  • Git

To fetch source code, change in to directory of your choice and run:

git clone -b main \
    https://github.com/u-brite/canAI.git

Requirements

OS:

Currently works only in Mac OS. Docker versions may need to be explored later to make it useable in Mac (and potentially Windows).

Tools:

  • Pip3
  • Streamlit
  • Scikit-learn
  • Plotly

Activate pip environment

Change in to root directory and run the commands below:

# create pip environment. Needed only the first time.
python3 -m venv canAI_venv

# activate pip environment
source canAI_venv/bin/activate

# Install packages in the environment
pip install -r requirements.txt

Steps to run

Run Streamlit App

streamlit run src/streamlit_app.py

Results

Survival analysis plots show the occurence of death over time. The survival function is the probability that the death has not occured yet. For the analysis we used the KaplanMeierFitter class from the lifelines python module[2]. It has been fitted on the days_to_death and days_to_first_biochemical_recurrence columns from the dataset.

Survival plots

We hypothesis that Machine Learning feature selection methods would be able to find novel biomarkers that distinguish prostate cancer patients than traditional Differential Expression analysis. We chose 4 methods from scikit-learn package:

  • ExtraTrees - Results here

  • RFEmethod - Results here

  • Univariate (f_regression) - Results here

  • Univariate (chi-square) - Resulting network here

Streamlit App

Streamlit screenshot

Team Members

Name Email Role
Tarun Mamidi [email protected] Team Leader
Jędrzej Kubica [email protected] Team Member
Mohit Bansal [email protected] Team Member
Sarmad Mehmood [email protected] Team Member
Santhosh Kumar Karthikeyan [email protected] Team Member
Shannon Lynch [email protected] Team Member
Sylvia Robertson [email protected] Team Member
Kevin Buckley [email protected] Team Member

References:

  1. https://plotly.com/python/v3/ipython-notebooks/survival-analysis-r-vs-python/

  2. https://lifelines.readthedocs.io/en/latest/

  3. https://www.frontiersin.org/articles/10.3389/fgene.2021.620453/full

  4. https://www.biorxiv.org/content/10.1101/2021.09.29.462364v1.full.pdf

About

Comparing feature extraction methods for biomarker discovery in a prostate-cancer study.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.1%
  • Python 4.9%