diff --git a/deepsvp.egg-info/PKG-INFO b/deepsvp.egg-info/PKG-INFO index 042b31f..0cfc411 100644 --- a/deepsvp.egg-info/PKG-INFO +++ b/deepsvp.egg-info/PKG-INFO @@ -1,49 +1,89 @@ Metadata-Version: 2.1 Name: deepsvp -Version: 1.0.0 +Version: 1.0.3 Summary: DeepSVP: Integration of Genomics and Phenotypes forStructural Variant Prioritization using Deep Learning Home-page: UNKNOWN Author: Azza Althagafi Author-email: azza.althagafi@kaust.edu.sa License: Apache 2.0 -Download-URL: https://github.com/bio-ontology-research-group/deepsvp/archive/v1.0.2.tar.gz +Download-URL: https://github.com/bio-ontology-research-group/deepsvp/archive/v1.0.3.tar.gz Description: # DeepSVP - DeepSVP is a computational method to prioritize structural variants involved in genetic diseases by combining genomic information with information about gene functions. We incorporate phenotypes linked to genes, functions - of gene products, gene expression in individual celltypes, and - anatomical sites of expression, and systematically relate them to - their phenotypic consequences through ontologies and machine - learning + DeepSVP is a computational method to prioritize structural variants (SV) involved in genetic diseases by combining genomic information with information about gene functions. We incorporate phenotypes linked to genes, functions of gene products, gene expression in individual celltypes, and anatomical sites of expression. DeepSVP systematically relates them to their phenotypic consequences through ontologies and machine learning. - ## Dataset - We train and evaluate our method using human genomic Structural Variation collected from [dbvar](https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_assembly/GRCh38/vcf/) dataset. + ## Training dataset + We train and evaluate our method using human SV collected from [dbvar](https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_assembly/GRCh38/vcf/) dataset. - ## Prediction the candidate CNVs workflow - We integrate the annotates from Gene ontology [GO](http://geneontology.org/docs/download-go-annotations/), Uber-anatomy ontology - [UBERON](https://www.ebi.ac.uk/ols/ontologies/uberon), Mammalian Phenotype ontology [MP](http://www.informatics.jax.org/vocab/mp_ontology), and Human Phenotype Ontology [HPO](https://hpo.jax.org/app/download/annotation) using [DL2vec](https://github.com/bio-ontology-research-group/DL2Vec). We convert different types of Description Logic axioms into graph representation, and then generate an embedding for each node and edge type. - We collected genomics features using public tool [AnnotSV (v2.3 or 2.2)](https://lbgi.fr/AnnotSV/annotations). + ## Annotation data sources (integrated in the candidate SV prediction workflow) + We integrated the annotations from different sources: + - Gene ontology ([GO](http://geneontology.org/docs/download-go-annotations/)) + - Uber-anatomy ontology ([UBERON](https://www.ebi.ac.uk/ols/ontologies/uberon)) + - Mammalian Phenotype ontology ([MP](http://www.informatics.jax.org/vocab/mp_ontology)) + - Human Phenotype Ontology ([HPO](https://hpo.jax.org/app/download/annotation)) + + This work is done using [DL2vec](https://github.com/bio-ontology-research-group/DL2Vec). We convert different types of Description Logic axioms into graph representation, and then generate an embedding for each node and edge type. + + We collected [genomics features](https://lbgi.fr/AnnotSV/annotations) using the [AnnotSV (v2.2)](https://lbgi.fr/AnnotSV/downloads) public tool. ## Installation + Using pip version 20.3.1: ``` pip install deepsvp ``` - ## Running the prediction model - - Download all the files in [data](https://bio2vec.cbrc.kaust.edu.sa/data/DeepSVP/) and place them into data folder. - - Download and install the required database [AnnoSV (v2.3 or 2.2)](https://lbgi.fr/AnnotSV/downloads), and then run: - ``` - bash scripts/annotation.sh -i input.vcf -o annotated_file - ``` - and place the annotated VCF file into data folder. + Or you can create a specific Conda Environments (e.g. named "deepsvp-py38-pip2031"): + ``` + conda create -n deepsvp-py38-pip2031 python=3.8 pip=20.3.1 + conda activate deepsvp-py38-pip2031 + pip3 install deepsvp + pip3 install networkx + pip3 install torch + pip3 list + conda deactivate + ``` + + ## Running the DeepSVP prediction model + - Download all the files from [data](https://bio2vec.cbrc.kaust.edu.sa/data/DeepSVP/) and place the uncompressed files/repository in the folder named "data": + ``` + mkdir DeepSVP/ ;# /path_of_your_DeepSVP_repository/ + cd DeepSVP + wget "https://bio2vec.cbrc.kaust.edu.sa/data/DeepSVP/data.zip" + unzip data.zip + cd data ;# /path_of_your_DeepSVP_data_repository/ + wget "https://bio2vec.cbrc.kaust.edu.sa/data/DeepSVP/experiments.zip" # can be very long + unzip experiments.zip + ``` + - Download and install the required [AnnoSV (2.3)](https://lbgi.fr/AnnotSV/downloads) tool in the "data" folder: + ``` + cd /path_of_your_DeepSVP_data_repository/ + git clone git@github.com:lgmgeo/AnnotSV.git --branch v2.3 + cd AnnotSV/ + make PREFIX=. install + make DESTDIR= PREFIX=. install-human-annotation + cd .. + ``` + + - Add genomic features to your VCF input file (/path_and_name_of_your_vcf_input_file/) thanks to AnnotSV (v2.3): + + e.g. /path_and_name_of_your_vcf_input_file/ = ./input.vcf + + e.g. /path_and_name_of_your_annotsv_output_file/ = ./data/output.annotsv.annotated.tsv + + ``` + bash + export ANNOTSV=/path_of_your_DeepSVP_data_repository/AnnotSV + $ANNOTSV/bin/AnnotSV -SVinputFile ./input.vcf -genomeBuild GRCh38 -outputFile ./data/output.annotsv.annotated.tsv + ``` + Your annotated VCF file (./data/output.annotsv.annotated.tsv) should be placed in the data folder (/path_of_your_DeepSVP_data_repository/). - Run the command `deepsvp --help` to display help and parameters: - ``` - Usage: main.py [OPTIONS] + ``` + Usage: deepsvp [OPTIONS] - DeepSVP: A phenotype-based tool to prioritize caustive CNV using WGS data - and Phenotype/Gene Functional Similarity + DeepSVP: A phenotype-based tool to prioritize caustive CNV using WGS data + and Phenotype/Gene Functional Similarity - Options: + Options: -d, --data-root TEXT Data root folder [required] -i, --in-file TEXT Annotated Input file [required] -p, --hpo TEXT List of phenotype ids separated by commas @@ -55,13 +95,22 @@ Description: # DeepSVP -ag, --aggregation TEXT Aggregation method for the genes within CNV (max or mean) default=max -o, --outfile TEXT Output result file - --help Show this message and exit. - - ``` + --help Show this message and exit. + ``` - ### Example: + - Run the example (with you own HPO terms): + ``` + deepsvp -d data/ -i output.annotsv.annotated.tsv -p HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218 -m cl -maf 0.01 -ag max -o example_output.txt + ``` + Or run the example with the deepsvp-py38-pip2031 Conda Environment: + ``` + conda activate deepsvp-py38-pip2031 + deepsvp -d data/ -i $your_annotsv_output.annotated.tsv -p HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218 -m cl -maf 0.01 -ag max -o example_output.txt + conda deactivate + ``` + Or by using [cwl-runner](https://github.com/common-workflow-language/cwltool), modify the input file in the input example yaml [deepsvp.yaml](https://github.com/bio-ontology-research-group/DeepSVP/blob/master/deepsvp.yaml) file and then run: - deepsvp -d data/ -i example_annotsv.tsv -p HP:0003701,HP:0001324,HP:0010628,HP:0003388,HP:0000774,HP:0002093,HP:0000508,HP:0000218 -m cl -maf 0.01 -ag max -o example_output.txt + cwl-runner deepsvp.cwl deepsvp.yaml ``` |======== | 25% Reading the input phenotypes... @@ -70,6 +119,8 @@ Description: # DeepSVP |================================| 100% DONE! You can find the prediction results in the output file: example_output.txt ``` + + #### Output: The script will output a ranking a score for the candidate caustive CNV. diff --git a/dist/deepsvp-1.0.3-py3-none-any.whl b/dist/deepsvp-1.0.3-py3-none-any.whl new file mode 100644 index 0000000..b69d9fa Binary files /dev/null and b/dist/deepsvp-1.0.3-py3-none-any.whl differ diff --git a/setup.py b/setup.py index 2cce4e1..64baa97 100644 --- a/setup.py +++ b/setup.py @@ -27,13 +27,13 @@ setup( name="deepsvp", - version="1.0.2", + version="1.0.3", description="DeepSVP: Integration of Genomics and Phenotypes forStructural Variant Prioritization using Deep Learning", long_description=open(README).read(), long_description_content_type="text/markdown", author="Azza Althagafi", author_email="azza.althagafi@kaust.edu.sa", - download_url="https://github.com/bio-ontology-research-group/deepsvp/archive/v1.0.2.tar.gz", + download_url="https://github.com/bio-ontology-research-group/deepsvp/archive/v1.0.3.tar.gz", license="Apache 2.0", packages=["deepsvp",], package_data={"deepsvp": [],},