Skip to content

Set of scripts to prepare the input to run several splicing-related tools from VCF files

License

Notifications You must be signed in to change notification settings

PedroBarbosa/Prepare_SplicingPredictors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Preparing splicing-related models

A simple utility to process VCF files and generate correct input for a set of models that require sequence information (or that require specific format to run a web-based application).

Motivation

There are a lot of available methods that predict splicing-related information (e.g. splice sites, branchpoints, splicing regulatory elements). Since they were not originally designed to predict the effect of genetic variants, it is not straightforward to use these models for that task. This tool simplifies that goal: generates reference and mutated sequences from VCF files in the proper format to run several models for all variants at once, and contains utilities to process the output and generate a VCF with a final score (usually mutated allele - reference allele).

Requirements

  • The variants should be annotated with ensembl VEP so that strand information can be retrieved (and therefore the proper sequence context of the variant can be extracted).
  • Processing scripts do not expect chromosome notation to contain chr string.

Citation

If you find this package useful in any way, please consider citing the work for which it was developed:
Computational prediction of human deep intronic variation

Instalation

git clone https://github.com/PedroBarbosa/Prepare_SplicingPredictors.git
cd Prepare_SplicingPredictors
conda env create --file conda_environment.yaml 
conda activate prepareSplicingTools
pip install .

Running

To run this utility, just call the vcf2seq and select the models you want to generate input for (check the available options with vcf2seq --help).

vcf2seq input.vcf.gz reference_genome.fa outbasename --maxentscan --splicerover ...

For models that predict splice sites, it may be necessary to set the splice site flag (--ss donor, --ss acceptor). Then, within each model folder (src folder in this repo), there are instructions on how to run each model and a script (get_mutation_effects.py) to process the output and generate a VCF with the predictions.

Note: Do not change the fasta headers of the generated sequences, since the get_mutation_effects.py scripts require original names for proper processing.

Supported models

General methods

Splice site prediction

Splicing regulatory elements

Branchpoint signals

Limitations

For most models, only single-nucleotide variants (SNVs) are supported.

Contact

[email protected]

About

Set of scripts to prepare the input to run several splicing-related tools from VCF files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published