-
Notifications
You must be signed in to change notification settings - Fork 5
2. How to Run DEN IM
DEN-IM is a user-friendly automated workflow that enables the analysis of metagenomic or targeted sequencing data for the identification, serotyping and genotyping, and phylogenetic analysis of DENV.
It provides a set of default parameters and directives, derived from our own analysis experience, to make the execution of the workflow as simple as possible.
Users can customize the workflow execution either by using command line options or by modifying a simple plain-text configuration file (params.config), where parameters are set as key-value pairs. The version of tools used can also be changed by providing new container tags in the appropriate configuration file (containers.config), as well as the resources for each process (resources.config).
We'll be using a local installation of DEN-IM workflow for this example. Detailed installation instructions are available at the Getting Started section. The repository was cloned with the following command
git clone https://github.com/B-UMMI/DEN-IM.git
The rest of this tutorial will take place inside the DEN-IM/
directory.
As an example, we'll be downloading a raw amplicon sequencing dataset available on EBI, and save the data in a directory named fastq
.
mkdir fastq
wget -P fastq ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR582/006/SRR5821236/SRR5821236_1.fastq.gz
wget -P fastq ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR582/006/SRR5821236/SRR5821236_2.fastq.gz
The raw sequencing data files are now saved inside a folder named fastq
, located in the repository folder.
The first step will be to check the running parameters in nextflow.config
file. Inside you can find all the parameters for the DEN-IM execution.
If we do the command head params.config
, this is the output:
$ head params.config
params {`
`fastq = 'fastq/*_{1,2}.*'`
`genomeSize = 0.01`
`minCoverage = 10`
`adapters = 'None'`
`trimSlidingWindow = '5:20'`
`trimLeading = 3`
`trimTrailing = 3`
(...)
}
We can alter any of the DEN-IM's parameters by altering the values in this file. We we leave this file unaltered.
Alternatively, you can alter any of the parameters directly when executing DEN-IM's workflow. You can consult them with the command nextflow run DEN-IM.nf --help
.
You can view and adjust the resources to be used by each process in the DEN-IM workflow by altering the resources.config
file. For the sake of simplicity we will leave this file as is.
$ head resources.config
process {
withName:fastqc_1_2 {
cpus = 2
memory = '4GB'
}
withName:trimmomatic_1_2 {
cpus = 2
memory = { 4.GB * task.attempt }
}
The Docker containers that are used by the workflow are listed in the containers.config
file. We do not recommend this file to be altered but it's useful to see what version of the tool is being used.
$ head containers.config
process {
withName:fastqc_1_2 {
container = "flowcraft/fastqc:0.11.7-1"
}
withName:trimmomatic_1_2 {
container = "flowcraft/trimmomatic:0.36-1"
}
withName:filter_poly_1_3 {
container = "flowcraft/prinseq:0.20.4-1"
}
If you have all the necessary dependencies installed see Getting Started, you can execute the Den-IM workflow.
Depending on the container engine used, you'll need to adjust the -profile
parameter accordingly, but with docker the following command is used:
nextflow run DEN-IM.nf -profile docker
You can consult nextflow execution options by running the command nextflow run -h
. We recommend using the -resume
option to resume a previous nextflow execution without loosing previous progress. We're running nextflow with Docker, so we set the -profile Docker
option.
$ nextflow run DEN-IM.nf --fastq fastq/*_{1,2}.* -profile docker
N E X T F L O W ~ version 0.32.0
Launching `DEN-IM.nf` [ridiculous_watson] - revision: ec76fe4a78
============================================================
D E N - I M
============================================================
Input FastQ : 2
Input samples : 1
Reports are found in : ./reports
Results are found in : ./results
Profile : docker
Version : 2.1 (local version)
Starting pipeline at Sat Jun 29 17:37:04 WEST 2019
[warm up] executor > local
[skipping] Stored process > bowtie_build_1_4 (DENV_MAPPING_V2)
[58/fa42b7] Submitted process > integrity_coverage_1_1 (SRR5821236)
(...)
[25/4535d9] Submitted process > report_raxml_1_13 (raxml)
[8a/fb89a4] Submitted process > report (single)
[f5/d7104b] Submitted process > status (single)
[5f/dab9ea] Submitted process > compile_status_buffer (1)
[59/6eb4e0] Submitted process > compile_reports (1)
[65/f46488] Submitted process > compile_status
Completed at: Sat Jun 29 19:07:56 WEST 2019
Duration : 1h 8m 42s
Success : true
Exit status : 0
After successfully finishing the workflow, the results of all DEN-IM processes are available at results/
.
$ tree results
results
├── alignment
│ └── mafft_1_12
│ └── DEN-IM.nf.align -> /home/ines/DEN-IM/DENIM/work/2b/a448a59c82d9c1e1fb4f7ec676bf0d/DEN-IM.nf.align
├── assembly
│ ├── megahit_1_7
│ │ └── SRR5821236_megahit113.fasta
│ ├── pilon_1_9
│ │ └── SRR5821236_megahit113_filt_polished.fasta
│ ├── spades_1_7
│ └── split_assembly_1_10
│ └── SRR5821236
│ └── SRR5821236_k77_1_flag_1_multi_12864.0000_len_10232_pilon.fasta -> /home/ines/DEN-IM/DEN-IM/work/85/4eba3ac5c34526d40bc5f675587cd2/SRR5821236_k77_1_flag_1_multi_12864.0000_len_10232_pilon.fasta
├── dengue_typing
│ └── SRR5821236_k77_1_flag_1_multi_12864.0000_len_10232_pilon
│ ├── gb:HQ705619.fa -> /home/ines/DEN-IM/DEN-IM/work/52/251ecf42d0090715770648643e2e36/gb:HQ705619.fa
│ ├── seq_typing.report.txt -> /home/ines/DEN-IM/DEN-IM/work/52/251ecf42d0090715770648643e2e36/seq_typing.report.txt
│ ├── seq_typing.report_types.tab -> /home/ines/DEN-IM/DEN-IM/work/52/251ecf42d0090715770648643e2e36/seq_typing.report_types.tab
│ └── SRR5821236_k77_1_flag_1_multi_12864.0000_len_10232_pilon.fasta -> /home/ines/DEN-IM/DEN-IM/work/52/251ecf42d0090715770648643e2e36/SRR5821236_k77_1_flag_1_multi_12864.0000_len_10232_pilon.fasta
├── mapping
│ ├── bowtie_1_4
│ │ ├── SRR5821236.bam -> /home/ines/DEN-IM/DEN-IM/work/54/c9e1ea61102b583e130d4d005f809a/SRR5821236.bam
│ │ └── SRR5821236_bowtie2.log -> /home/ines/DEN-IM/DEN-IM/work/54/c9e1ea61102b583e130d4d005f809a/SRR5821236_bowtie2.log
│ └── retrieve_mapped_1_5
│ ├── SRR5821236_mapped_1.headersRenamed_1.fq.gz -> /home/ines/DEN-IM/DEN-IM/work/0a/3625d778dea84c46c9709319b1f15e/SRR5821236_mapped_1.headersRenamed_1.fq.gz
│ └── SRR5821236_mapped_2.headersRenamed_2.fq.gz -> /home/ines/DEN-IM/DEN-IM/work/0a/3625d778dea84c46c9709319b1f15e/SRR5821236_mapped_2.headersRenamed_2.fq.gz
├── phylogeny
│ └── raxml_1_13
│ ├── RAxML_bestTree.DEN-IM.nf -> /home/ines/DEN-IM/DEN-IM/work/da/ab844928a9ab8b5a09be0e3a09b781/RAxML_bestTree.DEN-IM.nf
│ ├── RAxML_bipartitionsBranchLabels.DEN-IM.nf -> /home/ines/DEN-IM/DEN-IM/work/da/ab844928a9ab8b5a09be0e3a09b781/RAxML_bipartitionsBranchLabels.DEN-IM.nf
│ ├── RAxML_bipartitions.DEN-IM.nf -> /home/ines/DEN-IM/DEN-IM/work/da/ab844928a9ab8b5a09be0e3a09b781/RAxML_bipartitions.DEN-IM.nf
│ ├── RAxML_bootstrap.DEN-IM.nf -> /home/ines/DEN-IM/DEN-IM/work/da/ab844928a9ab8b5a09be0e3a09b781/RAxML_bootstrap.DEN-IM.nf
│ └── RAxML_info.DEN-IM.nf -> /home/ines/DEN-IM/DEN-IM/work/da/ab844928a9ab8b5a09be0e3a09b781/RAxML_info.DEN-IM.nf
└── trimmomatic_1_2
├── SRR5821236_1_trim.fastq.gz -> /home/ines/DEN-IM/DEN-IM/work/b1/27b64cbf7c425d75952f4b5da6b2d5/SRR5821236_1_trim.fastq.gz
└── SRR5821236_2_trim.fastq.gz -> /home/ines/DEN-IM/DEN-IM/work/b1/27b64cbf7c425d75952f4b5da6b2d5/SRR5821236_2_trim.fastq.gz
16 directories, 19 files
The HTML report can be found at pipeline_report/pipeline_report.html
. With the xdg-open
installed, you can simple do open pipeline_report/pipeline_report.html
and the report will open in your default browser. The HTML report for this sample is available online for consultation.
The SRR5821236 sample, with 24.78% of DENV, was assembled in a single 10232 nucleotide contig by MEGAHIT. This sequence classified by BLAST as belonging to serotype 3 and genotype III with 100% coverage and 99.94% identity to the gb:HQ705619 reference. This sample passed all quality control checks. Because only one sample was analysed, the NCBI DENV references for each of the 4 serotypes were included in the phylogenetic analysis.
Fig. 1: Phylogeny inference tree for the sample SRR5821236 with the closest typing reference (gb:HQ705619) and the 4 DENV NCBI references (NCBI-DENV-1: NC_001477.1, NCBI-DENV-2: NC_001474.2, NCBI-DENV-3: NC_001475.2, NCBI-DENV-4: NC_002640.1)