-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
83 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,49 +1,89 @@ | ||
Metadata-Version: 2.1 | ||
Name: deepsvp | ||
Version: 1.0.0 | ||
Version: 1.0.3 | ||
Summary: DeepSVP: Integration of Genomics and Phenotypes forStructural Variant Prioritization using Deep Learning | ||
Home-page: UNKNOWN | ||
Author: Azza Althagafi | ||
Author-email: [email protected] | ||
License: Apache 2.0 | ||
Download-URL: https://github.com/bio-ontology-research-group/deepsvp/archive/v1.0.2.tar.gz | ||
Download-URL: https://github.com/bio-ontology-research-group/deepsvp/archive/v1.0.3.tar.gz | ||
Description: # DeepSVP | ||
DeepSVP is a computational method to prioritize structural variants involved in genetic diseases by combining genomic information with information about gene functions. We incorporate phenotypes linked to genes, functions | ||
of gene products, gene expression in individual celltypes, and | ||
anatomical sites of expression, and systematically relate them to | ||
their phenotypic consequences through ontologies and machine | ||
learning | ||
DeepSVP is a computational method to prioritize structural variants (SV) involved in genetic diseases by combining genomic information with information about gene functions. We incorporate phenotypes linked to genes, functions of gene products, gene expression in individual celltypes, and anatomical sites of expression. DeepSVP systematically relates them to their phenotypic consequences through ontologies and machine learning. | ||
|
||
## Dataset | ||
We train and evaluate our method using human genomic Structural Variation collected from [dbvar](https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_assembly/GRCh38/vcf/) dataset. | ||
## Training dataset | ||
We train and evaluate our method using human SV collected from [dbvar](https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_assembly/GRCh38/vcf/) dataset. | ||
|
||
## Prediction the candidate CNVs workflow | ||
We integrate the annotates from Gene ontology [GO](http://geneontology.org/docs/download-go-annotations/), Uber-anatomy ontology | ||
[UBERON](https://www.ebi.ac.uk/ols/ontologies/uberon), Mammalian Phenotype ontology [MP](http://www.informatics.jax.org/vocab/mp_ontology), and Human Phenotype Ontology [HPO](https://hpo.jax.org/app/download/annotation) using [DL2vec](https://github.com/bio-ontology-research-group/DL2Vec). We convert different types of Description Logic axioms into graph representation, and then generate an embedding for each node and edge type. | ||
We collected genomics features using public tool [AnnotSV (v2.3 or 2.2)](https://lbgi.fr/AnnotSV/annotations). | ||
## Annotation data sources (integrated in the candidate SV prediction workflow) | ||
We integrated the annotations from different sources: | ||
- Gene ontology ([GO](http://geneontology.org/docs/download-go-annotations/)) | ||
- Uber-anatomy ontology ([UBERON](https://www.ebi.ac.uk/ols/ontologies/uberon)) | ||
- Mammalian Phenotype ontology ([MP](http://www.informatics.jax.org/vocab/mp_ontology)) | ||
- Human Phenotype Ontology ([HPO](https://hpo.jax.org/app/download/annotation)) | ||
|
||
This work is done using [DL2vec](https://github.com/bio-ontology-research-group/DL2Vec). We convert different types of Description Logic axioms into graph representation, and then generate an embedding for each node and edge type. | ||
|
||
We collected [genomics features](https://lbgi.fr/AnnotSV/annotations) using the [AnnotSV (v2.2)](https://lbgi.fr/AnnotSV/downloads) public tool. | ||
|
||
|
||
## Installation | ||
Using pip version 20.3.1: | ||
``` | ||
pip install deepsvp | ||
``` | ||
|
||
## Running the prediction model | ||
- Download all the files in [data](https://bio2vec.cbrc.kaust.edu.sa/data/DeepSVP/) and place them into data folder. | ||
- Download and install the required database [AnnoSV (v2.3 or 2.2)](https://lbgi.fr/AnnotSV/downloads), and then run: | ||
``` | ||
bash scripts/annotation.sh -i input.vcf -o annotated_file | ||
``` | ||
and place the annotated VCF file into data folder. | ||
Or you can create a specific Conda Environments (e.g. named "deepsvp-py38-pip2031"): | ||
``` | ||
conda create -n deepsvp-py38-pip2031 python=3.8 pip=20.3.1 | ||
conda activate deepsvp-py38-pip2031 | ||
pip3 install deepsvp | ||
pip3 install networkx | ||
pip3 install torch | ||
pip3 list | ||
conda deactivate | ||
``` | ||
|
||
## Running the DeepSVP prediction model | ||
- Download all the files from [data](https://bio2vec.cbrc.kaust.edu.sa/data/DeepSVP/) and place the uncompressed files/repository in the folder named "data": | ||
``` | ||
mkdir DeepSVP/ ;# /path_of_your_DeepSVP_repository/ | ||
cd DeepSVP | ||
wget "https://bio2vec.cbrc.kaust.edu.sa/data/DeepSVP/data.zip" | ||
unzip data.zip | ||
cd data ;# /path_of_your_DeepSVP_data_repository/ | ||
wget "https://bio2vec.cbrc.kaust.edu.sa/data/DeepSVP/experiments.zip" # can be very long | ||
unzip experiments.zip | ||
``` | ||
- Download and install the required [AnnoSV (2.3)](https://lbgi.fr/AnnotSV/downloads) tool in the "data" folder: | ||
``` | ||
cd /path_of_your_DeepSVP_data_repository/ | ||
git clone [email protected]:lgmgeo/AnnotSV.git --branch v2.3 | ||
cd AnnotSV/ | ||
make PREFIX=. install | ||
make DESTDIR= PREFIX=. install-human-annotation | ||
cd .. | ||
``` | ||
|
||
- Add genomic features to your VCF input file (/path_and_name_of_your_vcf_input_file/) thanks to AnnotSV (v2.3): | ||
|
||
e.g. /path_and_name_of_your_vcf_input_file/ = ./input.vcf | ||
|
||
e.g. /path_and_name_of_your_annotsv_output_file/ = ./data/output.annotsv.annotated.tsv | ||
|
||
``` | ||
bash | ||
export ANNOTSV=/path_of_your_DeepSVP_data_repository/AnnotSV | ||
$ANNOTSV/bin/AnnotSV -SVinputFile ./input.vcf -genomeBuild GRCh38 -outputFile ./data/output.annotsv.annotated.tsv | ||
``` | ||
Your annotated VCF file (./data/output.annotsv.annotated.tsv) should be placed in the data folder (/path_of_your_DeepSVP_data_repository/). | ||
|
||
- Run the command `deepsvp --help` to display help and parameters: | ||
``` | ||
Usage: main.py [OPTIONS] | ||
``` | ||
Usage: deepsvp [OPTIONS] | ||
|
||
DeepSVP: A phenotype-based tool to prioritize caustive CNV using WGS data | ||
and Phenotype/Gene Functional Similarity | ||
DeepSVP: A phenotype-based tool to prioritize caustive CNV using WGS data | ||
and Phenotype/Gene Functional Similarity | ||
|
||
Options: | ||
Options: | ||
-d, --data-root TEXT Data root folder [required] | ||
-i, --in-file TEXT Annotated Input file [required] | ||
-p, --hpo TEXT List of phenotype ids separated by commas | ||
|
@@ -55,13 +95,22 @@ Description: # DeepSVP | |
-ag, --aggregation TEXT Aggregation method for the genes within CNV (max | ||
or mean) default=max | ||
-o, --outfile TEXT Output result file | ||
--help Show this message and exit. | ||
|
||
``` | ||
--help Show this message and exit. | ||
``` | ||
|
||
### Example: | ||
- Run the example (with you own HPO terms): | ||
``` | ||
deepsvp -d data/ -i output.annotsv.annotated.tsv -p HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218 -m cl -maf 0.01 -ag max -o example_output.txt | ||
``` | ||
Or run the example with the deepsvp-py38-pip2031 Conda Environment: | ||
``` | ||
conda activate deepsvp-py38-pip2031 | ||
deepsvp -d data/ -i $your_annotsv_output.annotated.tsv -p HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218 -m cl -maf 0.01 -ag max -o example_output.txt | ||
conda deactivate | ||
``` | ||
Or by using [cwl-runner](https://github.com/common-workflow-language/cwltool), modify the input file in the input example yaml [deepsvp.yaml](https://github.com/bio-ontology-research-group/DeepSVP/blob/master/deepsvp.yaml) file and then run: | ||
|
||
deepsvp -d data/ -i example_annotsv.tsv -p HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218 -m cl -maf 0.01 -ag max -o example_output.txt | ||
cwl-runner deepsvp.cwl deepsvp.yaml | ||
|
||
``` | ||
|======== | 25% Reading the input phenotypes... | ||
|
@@ -70,6 +119,8 @@ Description: # DeepSVP | |
|================================| 100% DONE! You can find the prediction results in the output file: example_output.txt | ||
``` | ||
|
||
|
||
|
||
#### Output: | ||
The script will output a ranking a score for the candidate caustive CNV. | ||
|
||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,13 +27,13 @@ | |
|
||
setup( | ||
name="deepsvp", | ||
version="1.0.2", | ||
version="1.0.3", | ||
description="DeepSVP: Integration of Genomics and Phenotypes forStructural Variant Prioritization using Deep Learning", | ||
long_description=open(README).read(), | ||
long_description_content_type="text/markdown", | ||
author="Azza Althagafi", | ||
author_email="[email protected]", | ||
download_url="https://github.com/bio-ontology-research-group/deepsvp/archive/v1.0.2.tar.gz", | ||
download_url="https://github.com/bio-ontology-research-group/deepsvp/archive/v1.0.3.tar.gz", | ||
license="Apache 2.0", | ||
packages=["deepsvp",], | ||
package_data={"deepsvp": [],}, | ||
|