Preparing splicing-related models

A simple utility to process VCF files and generate correct input for a set of models that require sequence information (or that require specific format to run a web-based application).

Motivation

There are a lot of available methods that predict splicing-related information (e.g. splice sites, branchpoints, splicing regulatory elements). Since they were not originally designed to predict the effect of genetic variants, it is not straightforward to use these models for that task. This tool simplifies that goal: generates reference and mutated sequences from VCF files in the proper format to run several models for all variants at once, and contains utilities to process the output and generate a VCF with a final score (usually mutated allele - reference allele).

Requirements

The variants should be annotated with ensembl VEP so that strand information can be retrieved (and therefore the proper sequence context of the variant can be extracted).
Processing scripts do not expect chromosome notation to contain chr string.

Citation

If you find this package useful in any way, please consider citing the work for which it was developed:
Computational prediction of human deep intronic variation

Instalation

git clone https://github.com/PedroBarbosa/Prepare_SplicingPredictors.git
cd Prepare_SplicingPredictors
conda env create --file conda_environment.yaml 
conda activate prepareSplicingTools
pip install .

Running

To run this utility, just call the vcf2seq and select the models you want to generate input for (check the available options with vcf2seq --help).

vcf2seq input.vcf.gz reference_genome.fa outbasename --maxentscan --splicerover ...

For models that predict splice sites, it may be necessary to set the splice site flag (--ss donor, --ss acceptor). Then, within each model folder (src folder in this repo), there are instructions on how to run each model and a script (get_mutation_effects.py) to process the output and generate a VCF with the predictions.

Note: Do not change the fasta headers of the generated sequences, since the get_mutation_effects.py scripts require original names for proper processing.

Supported models

General methods

regSNP-intron

Splice site prediction

Splice2deep
SpliceRover
DSSP
Spliceator
MaxEntScan

Splicing regulatory elements

ESEfinder
ESRseq
HEXplorer

Branchpoint signals

SVM-BPfinder
BPP
BPHunter

Limitations

For most models, only single-nucleotide variants (SNVs) are supported.

Contact

pbarbosa@lasige.di.fc.ul.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

README.md

README.md

Preparing splicing-related models

Motivation

Requirements

Citation

Instalation

Running

Supported models

General methods

Splice site prediction

Splicing regulatory elements

Branchpoint signals

Limitations

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Preparing splicing-related models

Motivation

Requirements

Citation

Instalation

Running

Supported models

General methods

Splice site prediction

Splicing regulatory elements

Branchpoint signals

Limitations

Contact