OncoVI is a fully-automated Python implementation of the oncogenicity guidelines by Horak et al. (Genetics in Medicine, 2022).
Starting from the genomic location of the variants, OncoVI:
- performs functional annotation based on the Variant Effect Predictor (VEP) from Ensembl;
- collects biological evidences from the implemented publicly available resources;
- classifies the oncogenicity of somatic variants, based on the point-based system for combining pieces of evidence defined by Horak et al.
More detailed information on the resources used by OncoVI, the implementation of the oncogenicity guidelines, and its application to real-world data can be found in our pre-print.
The figure shows the implemented criteria in OncoVI (11 and five criteria for evidence of oncogenic and benign effect respectively), the public resources utilised to assess each criterion, the points associated with each criterion and the classification of oncogenicity into one of five classes on the basis of the variant-specific score, obtained as the sum of the points associated to the criteria triggered by OncoVI for the variant: score≥10:Oncogenic (O), 6≤score≤9:Likely Oncogenic (LO), 0≤score≤5:Variant of uncertain significance (VUS), -6≤score≤-1:Likely Benign (LB), score≤-7:Benign (B). Blue: resources suggested by the Standard Operating Procedure by Horak et al.; black: resources identified by the authors of OncoVI.
OncoVI was implemented and tested on a dedicated conda enviroment running on a remote server based on Ubuntu 20.04.4 long-term support (LTS) operating system. To run OncoVI the following packages are required:
- python
- numpy
- pandas
- subprocess
Due to size constraints, the COSMIC resources utilised by OncoVI could not be uploaded on the GitHub repo. How to download and handle COSMIC data to make them usable by OncoVI is described here below.
- First, All Data CMC for genome GRCh38 was downloaded
- Then, the data set was reduced to the columns:
GENE_NAME
,Mutation CDS
,Mutation AA
,AA_MUT_START
,Mutation genome position GRCh38
, andCOSMIC_SAMPLE_MUTATED
- The reduced data set was converted into a dictionary with the python script
/src/prepare_cosmic_resources.py
- The resulted dictionary was saved under the name
cosmic_all_dictionary.txt
- The path to the dictionary must be provided to the python script
03_OncoVI_SOP.py
- First, Census Genes Mutations for genome GRCh38 was downloaded
- Then, the data set was reduced to the columns:
GENE_SYMBOL
,MUTATION_CDS
,MUTATION_AA
, andHGVSG
- The reduced data set was converted into a dictionary with the python script
/src/prepare_cosmic_resources.py
- The resulted dictionary was saved under the name
cosmic_hgvsg_dictionary.txt
- The path to the dictionary must be provided to the python script
03_OncoVI_SOP.py
Due to size constraints the ClinVar resources utilised by the functional annotation STEP could not be uploaded on the GitHub repo. How to download and handle ClinVar data to make them usable by the functional annotation STEP is:
- First,
variant_summary.txt.gz
was downloaded from the ftp site - Then, the data set was reduced to the columns:
GeneSymbol
,ClinicalSignificance
,Chromosome
,Start
,VariationID
,ReferenceAlleleVCF
,AlternateAlleleVCF
,ReviewStatus
, andNumberSubmitters
- The reduced data set was converted into a dictionary with the python script
/src/create_clinvar_dict.py
- The resulted dictionary was saved under the name
clinvar_all_dictionary.txt
- The path to the ClinVar dictionary must be provided to the python script
02_VEP_based_pipeline.py
Clone the GitHub repository:
git clone https://github.com/MGCarta/oncovi.git
# Create the conda environment for oncovi
conda env create -n oncovi -f /path/to/OncoVIenvFile.yml
# Activate the conda environment
conda activate oncovi
# Run the installer (v. 111) available in the conda environment
vep_install --NO_HTSLIB -c '/path/to/.vep' -r '/path/to/.vep/Plugins/'
Then:
- select homo_sapiens_refseq_vep_111_GRCh38.tar.gz as cache
- homo_sapiens_vep_111_GRCh38.tar.gz as reference genome
- install all Plugins
The dbNSFP plugin is used by the the functional annotation STEP. Detailed information on how to set up the dbNSFP plugin for VEP can be found here. The dbNSFP Plugin must be enabled in the script vep.sh
according to the Plugin instructions.
The spliceAI plugin is used during the the functional annotation STEP. Detailed information on how to set up the spliceAI plugin for VEP can be found here. The spliceAI Plugin must be enabled in the script vep.sh
according to the Plugin instructions.
Both variants in text format and in variant call format (VCF) are accepted by VEP. Please refer to VEP official documentation for a detailed description of input formats. A test data is available under:
# /oncovi/testdata/SOP_table_union.txt
# Navigate to the directory in which the python script 02_VEP_based_pipeline.py is located
# Run the functional annotation
python 02_VEP_based_pipeline.py -i /path/to/oncovi/testdata/SOP_table_union.txt
# Navigate to the directory in which the python script 03_OncoVI_SOP.py is located
# Run OncoVI
python 03_OncoVI_SOP.py
Please, help us to improve OncoVI by describing your bug/issue in detail
The License file applies to all files within this repository.
OncoVI is intended for research purposes only and its use outside of this context is under the responsibility of the user, who should also comply with licences of the resources utilised.
Please cite our preprint Oncogenicity Variant Interpreter (OncoVI): oncogenicity guidelines implementation to support somatic variants interpretation in precision oncology if you decide to use OncoVI.