Skip to content

How to run metaGOflow

Haris Zafeiropoulos edited this page Oct 26, 2022 · 17 revisions

To demonstrate how one may run the metaGOflow workflow, we will use a marine Illumina MiSeq shotgun metagenome with the ENA accession run id ERR855786.

Using local raw data files

In case you have your raw data locally, then you can run metaGOflow by providing the forward and the reverse reads accordingly:

./run_wf.sh -f SAMPLE_1.fastq.gz -r SAMPLE_2.fastq.gz -n PREFIX_FOR_OUTPUT_FILES -d OUTPUT_FOLDER_NAME

Using an ENA accession id

If your raw data are stored either as public or private data in ENA, then,

  • if public, you need to provide metaGOflow with their corresponding run accession number.
./run_wf.sh -e ENA_RUN_ACCESSION_NUMBER -d OUTPUT_FOLDER_NAME -n PREFIX_FOR_OUTPUT_FILES
  • if private, you need to provide both the run accession number but also your credentials
./run_wf.sh -e ENA_RUN_ACCESSION_NUMBER -u ENA_USERNAME -k ENA_PASSWORD -d OUTPUT_FOLDER_NAME -n PREFIX_FOR_OUTPUT_FILES

The config.yml file

The config.yml file includes all the parameters of the metaGOflow workflow that the user can set.

  • The steps of the workflow to be performed. Remember! You can run later steps of the workflow (e.g. the functional annotation of the reads) at a later point, but you always need to:
    • have run the previous steps at a previous time
    • keep track of the required files (see )
  • The number of threads to be used.
  • Sequences filtering related parameters. You may check on the fastp documentation for that.
  • Assembly and functional related parameters that to a great extent define the computing time of their corresponding steps

Run in HPC

To run in a HPC environment, you need to build a batch script depending on the workload manager used in the HPC to be used.

For example, in case of SLURM you will have to make a file (e.g., metagoflow-job.sh) like the following:

#!/bin/bash

#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --nodelist=
#SBATCH --ntasks-per-node=NUM_OF_CORES
#SBATCH --mem=
#SBATCH --requeue
#SBATCH --job-name=metagoflow
#SBATCH --output=metagoflow.output

./run_wf.sh -e ENA_RUN_ACCESSION_NUMBER -d OUTPUT_FOLDER_NAME -n PREFIX_FOR_OUTPUT_FILES

Apparently, you need to make sure that in the node metaGOflow will run, all its dependencies will be fulfilled.

Docker or Singularity

metaGOflow may run using Docker (default) or Singularity container technology. To enable Singularity, you may add the -s argument when calling metaGOflow. In addition, there is a case where Singularity images need to be force-pulled. In this case, you need to run the get_singularity_images.sh.

cd Installation
bash get_singularity_images.sh

Thus, to run metaGOflow using Singularity, run: