Preparing splicing-related models

A simple utility to process VCF files and generate correct input for a set of models that require sequence information (or that require specific format to run a web-based application).

Motivation

There are a lot of available methods that predict splicing-related information (e.g. splice sites, branchpoints, splicing regulatory elements). Since they were not originally designed to predict the effect of genetic variants, it is not straightforward to use these models for that task. This tool simplifies that goal: generates reference and mutated sequences from VCF files in the proper format to run several models for all variants at once, and contains utilities to process the output and generate a VCF with a final score (usually mutated allele - reference allele).

Requirements

The variants should be annotated with ensembl VEP so that strand information can be retrieved (and therefore the proper sequence context of the variant can be extracted).
Processing scripts do not expect chromosome notation to contain chr string.

Citation

If you find this package useful in any way, please consider citing the work for which it was developed:
Computational prediction of human deep intronic variation

Instalation

git clone https://github.com/PedroBarbosa/Prepare_SplicingPredictors.git
cd Prepare_SplicingPredictors
conda env create --file conda_environment.yaml 
conda activate prepareSplicingTools
pip install .

Running

To run this utility, just call the vcf2seq and select the models you want to generate input for (check the available options with vcf2seq --help).

vcf2seq input.vcf.gz reference_genome.fa outbasename --maxentscan --splicerover ...

For models that predict splice sites, it may be necessary to set the splice site flag (--ss donor, --ss acceptor). Then, within each model folder (src folder in this repo), there are instructions on how to run each model and a script (get_mutation_effects.py) to process the output and generate a VCF with the predictions.

Note: Do not change the fasta headers of the generated sequences, since the get_mutation_effects.py scripts require original names for proper processing.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda_environment.yaml		conda_environment.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preparing splicing-related models

Motivation

Requirements

Citation

Instalation

Running

Supported models

General methods

Splice site prediction

Splicing regulatory elements

Branchpoint signals

Limitations

Contact

About

Releases

Packages

Languages

License

PedroBarbosa/Prepare_SplicingPredictors

Folders and files

Latest commit

History

Repository files navigation

Preparing splicing-related models

Motivation

Requirements

Citation

Instalation

Running

Supported models

General methods

Splice site prediction

Splicing regulatory elements

Branchpoint signals

Limitations

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages