Skip to content

2. How to Run DEN IM

Inês Mendes edited this page Aug 19, 2019 · 4 revisions

DEN-IM is a user-friendly automated workflow that enables the analysis of metagenomic or targeted sequencing data for the identification, serotyping and genotyping, and phylogenetic analysis of DENV.

It provides a set of default parameters and directives, derived from our own analysis experience, to make the execution of the workflow as simple as possible.

Users can customize the workflow execution either by using command line options or by modifying a simple plain-text configuration file (params.config), where parameters are set as key-value pairs. The version of tools used can also be changed by providing new container tags in the appropriate configuration file (containers.config), as well as the resources for each process (resources.config).

We'll be using a local installation of DEN-IM workflow for this example. Detailed installation instructions are available at the Getting Started section. The repository was cloned with the following command

git clone https://github.com/B-UMMI/DEN-IM.git

The rest of this tutorial will take place inside the DEN-IM/ directory.

1. Download the example data

As an example, we'll be downloading a raw amplicon sequencing dataset available on EBI, and save the data in a directory named fastq.

mkdir fastq

wget -P fastq ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR582/006/SRR5821236/SRR5821236_1.fastq.gz

wget -P fastq ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR582/006/SRR5821236/SRR5821236_2.fastq.gz

2. Set the parameters in nextflow.config file

The raw sequencing data files are now saved inside a folder named fastq, located in the repository folder. The first step will be to check the running parameters in nextflow.config file. Inside you can find all the parameters for the DEN-IM execution.

If we do the command head params.config, this is the output:

$ head params.config

     params {`

    `fastq = 'fastq/*_{1,2}.*'`
    `genomeSize = 0.01`
    `minCoverage = 10`
    `adapters = 'None'`
    `trimSlidingWindow = '5:20'`
    `trimLeading = 3`
    `trimTrailing = 3`
    (...)
    }

We can alter any of the DEN-IM's parameters by altering the values in this file. We we leave this file unaltered.

Alternatively, you can alter any of the parameters directly when executing DEN-IM's workflow. You can consult them with the command nextflow run DEN-IM.nf --help.

3. Set the resources in the resources.config file

You can view and adjust the resources to be used by each process in the DEN-IM workflow by altering the resources.config file. For the sake of simplicity we will leave this file as is.

$ head resources.config

    process {

    withName:fastqc_1_2 {
            cpus = 2
            memory = '4GB'
    }
    withName:trimmomatic_1_2 {
            cpus = 2
            memory = { 4.GB * task.attempt }
    }

4. Set the containers in the containers.config file

The Docker containers that are used by the workflow are listed in the containers.config file. We do not recommend this file to be altered but it's useful to see what version of the tool is being used.

$ head containers.config

    process {
    withName:fastqc_1_2 {
            container = "flowcraft/fastqc:0.11.7-1"
    }
    withName:trimmomatic_1_2 {
            container = "flowcraft/trimmomatic:0.36-1"
    }
    withName:filter_poly_1_3 {
            container = "flowcraft/prinseq:0.20.4-1"
    }

5. Executing the DEN-IM workflow

If you have all the necessary dependencies installed see Getting Started, you can execute the Den-IM workflow. Depending on the container engine used, you'll need to adjust the -profile parameter accordingly, but with docker the following command is used:

nextflow run DEN-IM.nf -profile docker

You can consult nextflow execution options by running the command nextflow run -h. We recommend using the -resume option to resume a previous nextflow execution without loosing previous progress. We're running nextflow with Docker, so we set the -profile Docker option.

$ nextflow run DEN-IM.nf --fastq fastq/*_{1,2}.* -profile docker
N E X T F L O W  ~  version 0.32.0
Launching `DEN-IM.nf` [ridiculous_watson] - revision: ec76fe4a78

============================================================
                     D E N - I M
============================================================

 Input FastQ                 : 2
 Input samples               : 1
 Reports are found in        : ./reports
 Results are found in        : ./results
 Profile                     : docker
 Version                     : 2.1 (local version)

 Starting pipeline at Sat Jun 29 17:37:04 WEST 2019

 [warm up] executor > local
 [skipping] Stored process > bowtie_build_1_4 (DENV_MAPPING_V2)
 [58/fa42b7] Submitted process > integrity_coverage_1_1 (SRR5821236)

 (...)
[25/4535d9] Submitted process > report_raxml_1_13 (raxml)
[8a/fb89a4] Submitted process > report (single)
[f5/d7104b] Submitted process > status (single)
[5f/dab9ea] Submitted process > compile_status_buffer (1)
[59/6eb4e0] Submitted process > compile_reports (1)
[65/f46488] Submitted process > compile_status
Completed at: Sat Jun 29 19:07:56 WEST 2019
Duration    : 1h 8m 42s
Success     : true
Exit status : 0

6. Checking the results

After successfully finishing the workflow, the results of all DEN-IM processes are available at results/.

$ tree results

results
├── alignment
│   └── mafft_1_12
│       └── DEN-IM.nf.align -> /home/ines/DEN-IM/DENIM/work/2b/a448a59c82d9c1e1fb4f7ec676bf0d/DEN-IM.nf.align
├── assembly
│   ├── megahit_1_7
│   │   └── SRR5821236_megahit113.fasta
│   ├── pilon_1_9
│   │   └── SRR5821236_megahit113_filt_polished.fasta
│   ├── spades_1_7
│   └── split_assembly_1_10
│       └── SRR5821236
│           └── SRR5821236_k77_1_flag_1_multi_12864.0000_len_10232_pilon.fasta -> /home/ines/DEN-IM/DEN-IM/work/85/4eba3ac5c34526d40bc5f675587cd2/SRR5821236_k77_1_flag_1_multi_12864.0000_len_10232_pilon.fasta
├── dengue_typing
│   └── SRR5821236_k77_1_flag_1_multi_12864.0000_len_10232_pilon
│       ├── gb:HQ705619.fa -> /home/ines/DEN-IM/DEN-IM/work/52/251ecf42d0090715770648643e2e36/gb:HQ705619.fa
│       ├── seq_typing.report.txt -> /home/ines/DEN-IM/DEN-IM/work/52/251ecf42d0090715770648643e2e36/seq_typing.report.txt
│       ├── seq_typing.report_types.tab -> /home/ines/DEN-IM/DEN-IM/work/52/251ecf42d0090715770648643e2e36/seq_typing.report_types.tab
│       └── SRR5821236_k77_1_flag_1_multi_12864.0000_len_10232_pilon.fasta -> /home/ines/DEN-IM/DEN-IM/work/52/251ecf42d0090715770648643e2e36/SRR5821236_k77_1_flag_1_multi_12864.0000_len_10232_pilon.fasta
├── mapping
│   ├── bowtie_1_4
│   │   ├── SRR5821236.bam -> /home/ines/DEN-IM/DEN-IM/work/54/c9e1ea61102b583e130d4d005f809a/SRR5821236.bam
│   │   └── SRR5821236_bowtie2.log -> /home/ines/DEN-IM/DEN-IM/work/54/c9e1ea61102b583e130d4d005f809a/SRR5821236_bowtie2.log
│   └── retrieve_mapped_1_5
│       ├── SRR5821236_mapped_1.headersRenamed_1.fq.gz -> /home/ines/DEN-IM/DEN-IM/work/0a/3625d778dea84c46c9709319b1f15e/SRR5821236_mapped_1.headersRenamed_1.fq.gz
│       └── SRR5821236_mapped_2.headersRenamed_2.fq.gz -> /home/ines/DEN-IM/DEN-IM/work/0a/3625d778dea84c46c9709319b1f15e/SRR5821236_mapped_2.headersRenamed_2.fq.gz
├── phylogeny
│   └── raxml_1_13
│       ├── RAxML_bestTree.DEN-IM.nf -> /home/ines/DEN-IM/DEN-IM/work/da/ab844928a9ab8b5a09be0e3a09b781/RAxML_bestTree.DEN-IM.nf
│       ├── RAxML_bipartitionsBranchLabels.DEN-IM.nf -> /home/ines/DEN-IM/DEN-IM/work/da/ab844928a9ab8b5a09be0e3a09b781/RAxML_bipartitionsBranchLabels.DEN-IM.nf
│       ├── RAxML_bipartitions.DEN-IM.nf -> /home/ines/DEN-IM/DEN-IM/work/da/ab844928a9ab8b5a09be0e3a09b781/RAxML_bipartitions.DEN-IM.nf
│       ├── RAxML_bootstrap.DEN-IM.nf -> /home/ines/DEN-IM/DEN-IM/work/da/ab844928a9ab8b5a09be0e3a09b781/RAxML_bootstrap.DEN-IM.nf
│       └── RAxML_info.DEN-IM.nf -> /home/ines/DEN-IM/DEN-IM/work/da/ab844928a9ab8b5a09be0e3a09b781/RAxML_info.DEN-IM.nf
└── trimmomatic_1_2
    ├── SRR5821236_1_trim.fastq.gz -> /home/ines/DEN-IM/DEN-IM/work/b1/27b64cbf7c425d75952f4b5da6b2d5/SRR5821236_1_trim.fastq.gz
    └── SRR5821236_2_trim.fastq.gz -> /home/ines/DEN-IM/DEN-IM/work/b1/27b64cbf7c425d75952f4b5da6b2d5/SRR5821236_2_trim.fastq.gz

16 directories, 19 files

The HTML report can be found at pipeline_report/pipeline_report.html. With the xdg-open installed, you can simple do open pipeline_report/pipeline_report.html and the report will open in your default browser. The HTML report for this sample is available online for consultation.

The SRR5821236 sample, with 24.78% of DENV, was assembled in a single 10232 nucleotide contig by MEGAHIT. This sequence classified by BLAST as belonging to serotype 3 and genotype III with 100% coverage and 99.94% identity to the gb:HQ705619 reference. This sample passed all quality control checks. Because only one sample was analysed, the NCBI DENV references for each of the 4 serotypes were included in the phylogenetic analysis.

Test_dataset_DENIM_tree Fig. 1: Phylogeny inference tree for the sample SRR5821236 with the closest typing reference (gb:HQ705619) and the 4 DENV NCBI references (NCBI-DENV-1: NC_001477.1, NCBI-DENV-2: NC_001474.2, NCBI-DENV-3: NC_001475.2, NCBI-DENV-4: NC_002640.1)