-
Notifications
You must be signed in to change notification settings - Fork 9
Home
Welcome to the rna-seq-pop wiki!
If you use this workflow in a paper, don't forget to give credits to the author by citing the URL of this (original) repository and, if available, its DOI (see above).
- Create a new github repository using this workflow as a template.
-
Clone the newly created repository to your local system, into the place where you want to perform the data analysis. As the workflow contains submodules (compkaryo & mpileup2readcounts),
git clone --recursive
should be used to clone the repository. - If you want to run the differential SNPs analysis, please compile mpileup2readcounts by navigating to its folder in workflow/scripts and:
g++ -std=c++11 -O3 mpileup2readcounts.cc -o mpileup2readcounts
To use the workflow, the user must supply a metadata sheet (usually 'config/samples.tsv'), and either have the fq.gz read files in "resources/reads/", or supply a separate fastq.tsv file which matches sampleIDs to the path of paired-end reads.
Configure the workflow according to your needs via editing the files in the config/
folder. Adjust the example config.yaml
to configure the workflow execution, and samples.tsv
to specify your sample setup. Information on how to setup the configuration file can be found in the example config and the config README config/README.md
, as recommended by snakemake best practices.
As well as fastq files and metadata, the workflow requires the following as inputs:
- A reference genome for your species (.fa)
- A reference transcriptome for your species (.fa)
- A list of contigs to analyse (e.g 2L, 3L, 3R, 3L, X)
- A genome feature file (.gff3)
- The name of the snpeff database for your species (e.g. Anopheles_gambiae or Aedes_aegypti_lvpagwg)
- A file which matches geneIDs to TranscriptIDs, requiring four columns (GeneID, GeneDescription, GeneName, TranscriptID)
Navigate to the root folder of the repository and run the workflow. For example:
snakemake --use-conda --cores 16
Here, conda will be used to install the required environments and run the workflow on 16 cores. It is highly recommended to use conda, as otherwise dependencies are likely to be missing, or be an incorrect version.
If you are using the workflow and would like to give feedback or troubleshoot, consider joining the discord server here