Pipeline for running PLAMB

Pipeline for running Plamb: https://github.com/RasmussenLab/vamb/tree/vamb_n2v_asy

Quick Start 🚀

# Create environment and install dependencies 
conda create -n ptracker_pipeline -c conda-forge -c bioconda 'snakemake==8.26.0' 'pandas==2.2.3' 'mamba==1.5.9'
conda activate ptracker_pipeline
pip install rich-click

# clone the repository
git clone https://github.com/Las02/ptracker_workflow -b try_cli

To run the entire pipeline including assembly pass in a whitespace separated file containing the reads:

./ptracker_workflow/cli.py --reads <read_file>  --output <output_directory>

The <read_file> could look like:

read1                          read2
im/a/path/to/sample_1/read1    im/a/path/to/sample_1/read2
im/a/path/to/sample_2/read1    im/a/path/to/sample_2/read2

❗ Notice the header names are required to be: read1 and read2

To dry run the pipeline before pass in the --dryrun flag

To run the pipeline from allready assembled reads pass in a whitespace separated file containing the reads and the path to the spades assembly directories for each read pair.

./ptracker_workflow/cli.py --reads_and_assembly_dir <reads_and_assembly_dir>  --output <output_directory>

This directory must contain the following 3 files which Spades produces:

Description	File Name from Spades
The assembled contigs	`contigs.fasta`
The simplified assembly graphs	`assembly_graph_after_simplification.gfa`
A metadata file	`contigs.paths`

This file could look like:

read1                          read2                         assembly_dir                                           
im/a/path/to/sample_1/read1    im/a/path/to/sample_1/read2   path/sample_1/Spades_output  
im/a/path/to/sample_2/read1    im/a/path/to/sample_2/read2   path/sample_2/Spades_output

❗ Notice the header names are required to be: read1, read2 and assembly_dir

Advanced

Resources

The pipeline can be configurated in: config/config.yaml Here the resources for each rule can be configurated as follows

spades:
  walltime: "15-00:00:00"
  threads: 16
  mem_gb: 245

if no resourcess are configurated for a rule the defaults will be used which are also defined in: config/config.yaml as

default_walltime: "48:00:00"
default_threads: 1
default_mem_gb: 50

If these exceed the resourcess available they will be scaled down to match the hardware available.

Running on cluster

(! needs to be tested !) You can extend the arguments passed to snakemake by the '--snakemake_arguments' flag This can then be used to have snakemake submit jobs to a cluster. On SLURM this could look like

./cli.py <arguments> --snakemake_arguments \
    '--jobs 16 --max-jobs-per-second 5 --max-status-checks-per-second 5 --latency-wait 60 \
    --cluster "sbatch  --output={rule}.%j.o --error={rule}.%j.e \
    --time={resources.walltime} --job-name {rule}  --cpus-per-task {threads} --mem {resources.mem_gb}G "'

on PBS this could like:

./cli.py <arguments> --snakemake_arguments \
    '--jobs 16 --max-jobs-per-second 5 --max-status-checks-per-second 5 --latency-wait 60 \
    --cluster "sbatch  --output={rule}.%j.o --error={rule}.%j.e \
    --time={resources.walltime} --job-name {rule}  --cpus-per-task {threads} --mem {resources.mem_gb}G "'

Running using snakemake CLI directly

The pipeline can be run without using the CLI wrapper around snakemake

The input files for the pipeline can be configurated in config/accesions.txt As an example this could look like:

SAMPLE ID READ1 READ2
Airways 4 reads/errorfree/Airways/reads/4/fw.fq.gz reads/errorfree/Airways/reads/4/rv.fq.gz
Airways 5 reads/errorfree/Airways/reads/5/fw.fq.gz reads/errorfree/Airways/reads/5/rv.fq.gz

with an installation of snakemake run the following to dry-run the pipeline

	snakemake -np --snakefile snakefile.smk

and running the pipeline with 4 threads

	snakemake -p -c4 --snakefile snakefile.smk --use-conda

File Structure

- snakefile.smk: The snakemake pipeline
- utils.py: utils used by the pipeline
- config: directory with the configuration files
  - accesions.txt: Sample information
  - config.yaml: configuration for the pipeline eg. resourcess
- envs: directory with the conda environment descriptions

## Misc files
Makefile - various small scripts for running the pipeline
clustersubmit.sh - script for submitting the snakefile to SLURM
parse_snakemake_output.py - small script for viewing snakefile logs

TODO

envs/pipeline_conda.yaml refers to specific path - change to releative

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.dvc		.dvc
bin		bin
config		config
envs		envs
.dvcignore		.dvcignore
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
cli.py		cli.py
clustersubmit.sh		clustersubmit.sh
parse_snakemake_output.py		parse_snakemake_output.py
return_all.py		return_all.py
snakefile.py		snakefile.py
split_fasta.py		split_fasta.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipeline for running PLAMB

Quick Start 🚀

Advanced

Resources

Running on cluster

Running using snakemake CLI directly

File Structure

TODO

About

Releases

Packages

Languages

Las02/ptracker_workflow

Folders and files

Latest commit

History

Repository files navigation

Pipeline for running PLAMB

Quick Start 🚀

Advanced

Resources

Running on cluster

Running using snakemake CLI directly

File Structure

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages