nf-core-wgsnano is a bioinformatics best-practice analysis pipeline for Nanopore Whole Genome Sequencing.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.
- Basecalling (
Guppy
) - with GPU run option - Basecalling QC (
PycoQC
,NanoPlot
) - Alignment (
Guppy
withminimap2
) - Quantification
-
Install
Nextflow
(>=22.10.1
) -
Install any of
Docker
,Singularity
(you can follow this tutorial),Podman
,Shifter
orCharliecloud
for full pipeline reproducibility (this pipeline can NOT be run with conda)). This requirement is not needed for running the pipeline in WashU RIS cluster. -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run dhslab/nf-core-wgsnano -profile test,YOURPROFILE(S) --outdir <OUTDIR>
-
Start running your own analysis!
nextflow run dhslab/nf-core-wgsnano --input samplesheet.csv --fasta <FASTA> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
- Input
samplesheet.cvs
which provides directory paths for fast5 raw reads and their metadata. this can be provided either in a configuration file or as--input path/to/samplesheet.cvs
command line parameter. Example sheet located inassets/samplesheet.csv
. - Reference genome fasta file, either in a configuration file or as
--fasta path/to/genome.fasta
command line parameter.
The following parameters are set to the shown default values, but should be modified when required in command line, or in user-provided config files:
--basecall_config dna_r10.4.1_e8.2_400bps_modbases_5mc_cg_sup.cfg
-> Guppy's config file for basecalling--nanopore_reads_type ont_r10_q20
-> PEPPER's reads-type option--use_gpu true
-> Enable GPU for basecalling (it can be disabled, but basecallling will take significantly longer time with CPU run)
NXF_HOME=${PWD}/.nextflow LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(ghcr.io/dhslab/docker-nextflow)" nextflow run dhslab/nf-core-wgsnano -r dev -profile test,ris,dhslab --outdir results
Notice that three profiles are used here:
test
-> to provideinput
andfasta
paths for the test runris
-> to set general configuration for RIS LSF clusterdhslab
-> to set lab-specific cluster configuration
git clone https://github.com/dhslab/nf-core-wgsnano.git
cd nf-core-wgsnano/
chmod +x bin/*
LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active $HOME:$HOME" bsub -g /dspencer/nextflow -G compute-dspencer -q dspencer -e nextflow_launcher.err -o nextflow_launcher.log -We 2:00 -n 2 -M 12GB -R "select[mem>=16000] span[hosts=1] rusage[mem=16000]" -a "docker(ghcr.io/dhslab/docker-nextflow)" "NXF_HOME=${PWD}/.nextflow ; nextflow run main.nf -profile test,ris,dhslab --outdir results"
.
├── multiqc
│ ├── multiqc_data
│ └── multiqc_plots
│ ├── pdf
│ ├── png
│ └── svg
├── pipeline_info
└── samples
├── sample_1
│ ├── fastq
│ ├── methylation_calls
│ │ ├── accumulated
│ │ └── stranded
│ ├── pepper
│ │ ├── haplotagged_bam
│ │ └── vcf
│ └── qc
│ ├── mosdepth
│ └── pycoqc
└── sample_2
├── fastq
├── methylation_calls
│ ├── accumulated
│ └── stranded
├── pepper
│ ├── haplotagged_bam
│ └── vcf
└── qc
├── mosdepth
└── pycoqc
- The pipeline is developed and optimized to be run in WashU RIS (LSF) HPC, but could be deployed in any
HPC environment supported by Nextflow
. - The pipeline does NOT support conda because some of the tools used are not available as conda packages.
- The pipeline can NOT be fully tested in a personal computer as basecalling step is computationally intense even for small test files. For testing/development purposes, the pipeline can be run in
stub
(dry-run) mode (see below).
Stub (dry-run) for testing and development purposes
stub
run requiresaws cli
anddocker
(or any other Containerization software)- steps:
- download the pipeline
- download the
stub-data
results generated from pre-run test analysis (requiresaws cli
installed). It should be downloaded in the pipeline directory (wgsnano/
) - Run the pipeline in
stub
mode
git clone https://github.com/dhslab/nf-core-wgsnano.git
cd nf-core-wgsnano/
aws s3 sync s3://davidspencerlab/nextflow/wgsnano/test-datasets/stub-test/ stub-test/ --no-sign-request
nextflow run main.nf -stub -profile stub,docker --outdir results