-
Notifications
You must be signed in to change notification settings - Fork 236
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Integration of RF model to predict efficacy of immune checkpoint bloc…
…kade (BioModels) (#1355) * update * update test data * update input types * add shed.yml * fix rv comments * fix docker
- Loading branch information
Showing
6 changed files
with
1,010 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
name: biomodel_BIOMD0000001066 | ||
owner: bgruening | ||
description: Random Forest model to predict efficacy of immune checkpoint blockade across multiple cancer patient cohorts | ||
|
||
long_description: | | ||
Random Forest model to predict efficacy of immune checkpoint blockade across multiple cancer patient cohorts. | ||
The official repository is at https://www.ebi.ac.uk/biomodels/BIOMD0000001066 | ||
categories: | ||
- Machine Learning | ||
- Trained models | ||
- Random Forest | ||
remote_repository_url: https://github.com/bgruening/galaxytools/tree/master/tools/biomodelsML | ||
homepage_url: https://www.ebi.ac.uk/biomodels/BIOMD0000001066 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
<tool id="biomodel_BIOMD0000001066" name="Random Forest model" version="@VERSION@" profile="22.05"> | ||
<description>to predict efficacy of immune checkpoint blockade across multiple cancer patient cohorts</description> | ||
<macros> | ||
<token name="@VERSION@">1</token> | ||
</macros> | ||
<requirements> | ||
<container type="docker">docker.io/anupkumar/biomd0000001066:@VERSION@</container> | ||
</requirements> | ||
<command><![CDATA[ | ||
cp /home/\$NB_USER/forest16.onnx ./ && | ||
python '$biom_script' | ||
]]> | ||
</command> | ||
<configfiles> | ||
<configfile name="biom_script"> | ||
<![CDATA[ | ||
import os | ||
import argparse | ||
import numpy | ||
import onnxruntime as rt | ||
import pandas as pd | ||
#if $input_file.ext == "xlsx": | ||
data_test = pd.read_excel('$input_file') | ||
#elif $input_file.ext == "tabular": | ||
data_test = pd.read_csv('$input_file', sep="\t") | ||
#elif $input_file.ext == "csv": | ||
data_test = pd.read_csv('$input_file', sep=",") | ||
#end if | ||
rf16=["Cancer_Type2", "Albumin", "HED", "TMB", "FCNA", "BMI", "NLR", "Platelets", "HGB", "Stage (1:IV; 0:I-III)", "Age", "Drug (1:Combo; 0:PD1/PDL1orCTLA4)", "Chemo_before_IO (1:Yes; 0:No)","HLA_LOH", "MSI (1:Unstable; 0:Stable_Indeterminate)", "Sex (1:Male; 0:Female)"] | ||
x_test16 = pd.DataFrame(data_test, columns=rf16) | ||
sess = rt.InferenceSession(os.getcwd() + "/forest16.onnx") | ||
input_name = sess.get_inputs()[0].name | ||
label_name = sess.get_outputs()[0].name | ||
pred_onx = sess.run([label_name], {input_name: x_test16.astype(numpy.float32).values})[0] | ||
x_test16["Predictions"] = pred_onx | ||
x_test16.to_csv('$output_file', sep="\t", index=None) | ||
]]> | ||
</configfile> | ||
</configfiles> | ||
<inputs> | ||
<param name="input_file" type="data" label="Test data" format="tabular,csv,xlsx" multiple="false" /> | ||
</inputs> | ||
<outputs> | ||
<data format="tabular" name="output_file" label="Predicted data"></data> | ||
</outputs> | ||
<tests> | ||
<test> | ||
<param name="input_file" value="test_data.tabular" ftype="tabular" /> | ||
<output name="output_file" file="pred_data.tabular"> | ||
<assert_contents> | ||
<has_n_columns n="17" /> | ||
<has_n_lines n="296" /> | ||
</assert_contents> | ||
</output> | ||
</test> | ||
<test> | ||
<param name="input_file" value="test_data.csv" ftype="csv" /> | ||
<output name="output_file" file="pred_data.tabular"> | ||
<assert_contents> | ||
<has_n_columns n="17" /> | ||
<has_n_lines n="296" /> | ||
</assert_contents> | ||
</output> | ||
</test> | ||
<test> | ||
<param name="input_file" value="test_data.xlsx" ftype="xlsx" /> | ||
<output name="output_file" file="pred_data.tabular"> | ||
<assert_contents> | ||
<has_n_columns n="17" /> | ||
<has_n_lines n="296" /> | ||
</assert_contents> | ||
</output> | ||
</test> | ||
</tests> | ||
<help> | ||
<![CDATA[ | ||
**What it does** | ||
The tool makes prediction of either responder (1) or non-responder (0) to an immunotherapy for multiple different cancer types. The features used to train a Random Forest (RF16) model include 16 genomic, molecular, demographic, and clinical. More details in associated publication: "Improved prediction of immune checkpoint blockade efficacy across multiple cancer types" (https://pubmed.ncbi.nlm.nih.gov/34725502/). The model is available via a Docker container. | ||
**Description** | ||
This is a Random Forest algorithm-based machine learning model called RF16, which incorporates a total of 16 genomic, molecular, demographic, and clinical features to predict the immunotherapy response for a patient. The model assigns a value of 0 for NonResponder and 1 for Responder. Please be aware that the column names in the GitHub code and the downloaded dataset from the publication may vary. Users are advised to make minor adjustments to either the code or the dataset to ensure compatibility. The curated version of the model has modified the column names in the training code to align with the dataset. The features used are: "Cancer_Type2", "Albumin", "HED", "TMB", "FCNA", "BMI", "NLR", "Platelets", "HGB", "Stage (1:IV; 0:I-III)", "Age", "Drug (1:Combo; 0:PD1/PDL1orCTLA4)", "Chemo_before_IO (1:Yes; 0:No)","HLA_LOH", "MSI (1:Unstable; 0:Stable_Indeterminate)", "Sex (1:Male; 0:Female)" | ||
**Input file** | ||
Provide a tabular file (as tabular or csv or xlsx) containing all the features as mentioned above. | ||
**Output file** | ||
Returns predicted data as a tabular file - the predictions are concatenated as a last column of the test data. | ||
]]> | ||
</help> | ||
<citations> | ||
<citation type="bibtex"> | ||
@ARTICLE{BIOMD0000001066, | ||
Author = {Chowell D and et al.}, | ||
title = {{Chowell2022 - Random Forest model to predict efficacy of immune checkpoint blockade across multiple cancer patient cohorts}}, | ||
url = {https://www.ebi.ac.uk/biomodels/BIOMD0000001066} | ||
} | ||
</citation> | ||
</citations> | ||
</tool> |
Oops, something went wrong.