Welcome to fivepseq readme!

Fivepseq is a software package for analysis of 5′ endpoints distribution in RNA degradome sequencing datasets.

Homepage

The homepage is hosted at Pelechano lab website at http://pelechanolab.com/software/fivepseq/.

User guide

Below is a quick manual to get you started. For detailed instructions and explanations on fivepseq output, please see the user guide at: https://fivepseq.readthedocs.io/en/latest/.

Citation

Nersisyan L, Ropat M, Pelechano V. Improved computational analysis of ribosome dynamics from 5′P degradome data using fivepseq. NAR Genomics and Bioinformatics, 2:4, 2020.

Installation

Fivepseq works with python versions <=3.8. If you have a higher version of python you may run into problems with some dependencies.

Install dependencies:

To set up fivepseq, the following python packages need to be pre-installed manually using pip (if you don't have pip you may install it as described here ).

Paste the following lines into the shell terminal:

git clone https://github.com/joshuagryphon/plastid -b develop
cd plastid
python setup.py install
pip install --upgrade numpy==1.19.5 pysam==0.19.0 cython==0.29.28

To install fivepseq, clone the project from github:

git clone https://github.com/lilit-nersisyan/fivepseq.git
cd fivepseq
python setup.py install

To check if fivepseq was installed correctly, type the following in the command line:

fivepseq --version

This should display the currently installed version of fivepseq. To display commandline arguments you may type:

fivepseq --help

In order to enable exporting vector and portable image files, you'll also need to have phantomjs installed as follows:

conda install phantomjs selenium pillow

Running fivepseq

Fivepseq requires the following files to run:

Aligned reads (.bam)

Alignment index (.bai)

Genomic sequence file (.fasta / .fa)

Genomic annotation file (.gff/ .gtf)

This section assumes that you already have these files. If not, please, refer to the section: Preparing data.

Fivepseq usage

The fivepseq --help command will show fivepseq usage and will list all the arguments.

usage: fivepseq -b B -g G -a A [optional arguments]

Required arguments

-b B   the full path one or many bam/sam files (many files should be provided with a pattern, **within double quotes**: e.g. ["your_bam_folder/*.bam"])
-g G   the full path to the fa/fasta file
-a A   the full path to the gtf/gff/gff3 file

Note:

The indexed alignment files should be in the same directory as bam files, with the same name, with .bai extension added.
Multiple bam files should be indicated with a pattern placed within double quotes: e.g. ["your_bam_folder/*.bam"]

Commonly, you will run fivepseq by also providing the name of the output folder ('fivepseq' by default) and the title of your run (determined from bam path otherwise):

fivepseq \
   -g <path_to_genome_fasta> \
   -a <path_to_annotation> \
   -b <path_to_bam_file(s) \
   -o <output_directory> \
   -t <title_of_the_run>

Note: this is a single commandline, the backslashes are used to move to a new line for cozy representation: either copy-paste like this or use a single line without the backslashes.

Additional arguments

Type fivepseq --help to see the list of additional arguments. For a detailed description of available arguments, see the User guide at: https://fivepseq.readthedocs.io/en/latest/.

Preprocessing from FASTQ files

Fastq files need to be preprocessed and aligned to the reference genome before proceeding to fivepseq downstream analysis. Preprocessing proceeds with the following steps:

quality checks (with FASTQC and MULTIQC),
adapter and quality based trimming,
UMI extraction (if the library was generated with UMIs),
mapping to reference
read deduplication (if the library was generated with UMIs),
bedgraph generation to view 5'P count distribution in genome viewers

An example of pre-processing pipeline can be found in the preprocess_scripts directory

In order to run this pipeline, you need to have access to common bioinformatics software such as STAR, UMI-tools, bedtools, Samtools, FastQC, MultiQC and cutadapt.

To use it, navigate to the directory where the script is located and use the following command in the prompt:

./fivepseq_preprocess.sh -f [path to directory containing fastq files] -g [path to genome fasta] -a [path to annotation gff/gtf] -i [path to reference index, if exists] -o [output directory] -s [which steps to skip: either or combination of characters {cudqm} ]

The option -s specifies which steps of the pipeline you'd like to skip. Possible values are:

c skip trimming adapters with cutadapt
u skip UMI extraction
d skip deduplication after alignment
q skip quality initial check: FASTQC and MULTIQC
p skip post-processing quality check: FASTQC and MULTIQC
m skip mapping
d skip deduplication

You may use any combination of these characters, e.g. use -s cudqm to skip all

This script will produce sub-folders in the output directory, containing results of each step of the pipeline. The bam files will be generated in the align_dedup folder.

In the In addition to performing the steps described above, it also evaluates the distribution of reads across the genome, according to gene classes {"rRNA" "mRNA" "tRNA" "snoRNA" "snRNA" "ncRNA"}. These statistics are kept in the align_rna/rna_stats.txt file.

!!NOTE!! This example pipeline treats files as singl-end libraries. If you have paired-end reads, you should only supply the first read (*_R1* files) to fivepseq.

Have fun!

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
doc		doc
fivepseq		fivepseq
preprocess_scripts		preprocess_scripts
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to fivepseq readme!

Homepage

User guide

Citation

Installation

Running fivepseq

Fivepseq usage

Required arguments

Additional arguments

Preprocessing from FASTQ files

About

Releases 21

Languages

License

lilit-nersisyan/fivepseq

Folders and files

Latest commit

History

Repository files navigation

Welcome to fivepseq readme!

Homepage

User guide

Citation

Installation

Running fivepseq

Fivepseq usage

Required arguments

Additional arguments

Preprocessing from FASTQ files

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 21

Languages