-
Notifications
You must be signed in to change notification settings - Fork 8
How to run metaGOflow
To demonstrate how one may run the metaGOflow workflow, we will use a marine Illumina MiSeq shotgun metagenome with the ENA accession run id ERR855786.
In case you have your raw data locally, then you can run metaGOflow by providing the forward and the reverse reads accordingly:
./run_wf.sh -f SAMPLE_1.fastq.gz -r SAMPLE_2.fastq.gz -n PREFIX_FOR_OUTPUT_FILES -d OUTPUT_FOLDER_NAME
If your raw data are stored either as public or private data in ENA, then,
- if public, you need to provide metaGOflow with their corresponding run accession number.
./run_wf.sh -e ENA_RUN_ACCESSION_NUMBER -d OUTPUT_FOLDER_NAME -n PREFIX_FOR_OUTPUT_FILES
- if private, you need to provide both the run accession number but also your credentials
./run_wf.sh -e ENA_RUN_ACCESSION_NUMBER -u ENA_USERNAME -k ENA_PASSWORD -d OUTPUT_FOLDER_NAME -n PREFIX_FOR_OUTPUT_FILES
The config.yml
file includes all the parameters of the metaGOflow workflow that the user can set.
- The steps of the workflow to be performed. Remember! You can run later steps of the workflow (e.g. the functional annotation of the reads) at a later point, but you always need to:
- have run the previous steps at a previous time
- keep track of the required files (see )
- The number of threads to be used.
- Sequences filtering related parameters. You may check on the
fastp
documentation for that. - Assembly and functional related parameters that to a great extent define the computing time of their corresponding steps
To run in a HPC environment, you need to build a batch script depending on the workload manager used in the HPC to be used.
For example, in case of SLURM you will have to make a file (e.g., metagoflow-job.sh
) like the following:
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --nodelist=
#SBATCH --ntasks-per-node=NUM_OF_CORES
#SBATCH --mem=
#SBATCH --requeue
#SBATCH --job-name=metagoflow
#SBATCH --output=metagoflow.output
./run_wf.sh -e ENA_RUN_ACCESSION_NUMBER -d OUTPUT_FOLDER_NAME -n PREFIX_FOR_OUTPUT_FILES
Apparently, you need to make sure that in the node metaGOflow will run, all its dependencies will be fulfilled.
metaGOflow may run using Docker (default) or Singularity container technology.
To enable Singularity, you may add the -s
argument when calling metaGOflow.
In addition, there is a case where Singularity images need to be force-pulled.
In this case, you need to run the get_singularity_images.sh.
cd Installation
bash get_singularity_images.sh
Thus, to run metaGOflow using Singularity, run:
Anything unclear or inaccurate? Please open an issue or email Dr.Haris Zafeiropoulos ([email protected]).
With respect to EMO BON protocols, samples, analyses you may contact the Observation, Data and Service Development Officer of EMBRC, Dr. Ioulia Santi ([email protected])