Home

Welcome to the rna-seq-pop wiki!

Usage

If you use this workflow in a paper, don't forget to give credits to the author by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

Create a new github repository using this workflow as a template.
Clone the newly created repository to your local system, into the place where you want to perform the data analysis. As the workflow contains submodules (compkaryo & mpileup2readcounts), git clone --recursive should be used to clone the repository.
If you want to run the differential SNPs analysis, please compile mpileup2readcounts by navigating to its folder in workflow/scripts and: g++ -std=c++11 -O3 mpileup2readcounts.cc -o mpileup2readcounts

Step 2: Configure workflow

To use the workflow, the user must supply a metadata sheet (usually 'config/samples.tsv'), and either have the fq.gz read files in "resources/reads/", or supply a separate fastq.tsv file which matches sampleIDs to the path of paired-end reads.

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust the example config.yaml to configure the workflow execution, and samples.tsv to specify your sample setup. Information on how to setup the configuration file can be found in the example config and the config README config/README.md, as recommended by snakemake best practices.

Requirements

As well as fastq files and metadata, the workflow requires the following as inputs:

A reference genome for your species (.fa)
A reference transcriptome for your species (.fa)
A list of contigs to analyse (e.g 2L, 3L, 3R, 3L, X)
A genome feature file (.gff3)
The name of the snpeff database for your species (e.g. Anopheles_gambiae or Aedes_aegypti_lvpagwg)
A file which matches geneIDs to TranscriptIDs, requiring four columns (GeneID, GeneDescription, GeneName, TranscriptID)

Step 3: Run workflow

Navigate to the root folder of the repository and run the workflow. For example:

snakemake --use-conda --cores 16

Here, conda will be used to install the required environments and run the workflow on 16 cores. It is highly recommended to use conda, as otherwise dependencies are likely to be missing, or be an incorrect version.

If you are using the workflow and would like to give feedback or troubleshoot, consider joining the discord server here

Provide feedback

Saved searches