Skip to content

Commit

Permalink
Now using local tests/stub files for GitHub CI
Browse files Browse the repository at this point in the history
  • Loading branch information
GallVp committed Apr 30, 2024
1 parent f819e6b commit ae0e6d2
Show file tree
Hide file tree
Showing 24 changed files with 245 additions and 98 deletions.
11 changes: 2 additions & 9 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,25 +39,18 @@ jobs:
steps:
- uses: actions/checkout@v4

- uses: actions/checkout@v4
with:
repository: PlantandFoodResearch/pangene-test
ssh-key: ${{ secrets.PANGENE_TEST_DEPLOY_KEY }}
path: pangene-test

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "23.04.4"

- name: Run stub-test
run: |
nextflow \
nextflow run \
main.nf \
-profile local,docker \
-resume \
-stub \
-c conf/test_stub.config
-params-file tests/stub/test_stub.json
confirm-pass:
runs-on: ubuntu-latest
Expand Down
23 changes: 15 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 0.2.0+dev - [22-April-2024]
## 0.2.0 - [24-April-2024]

### `Added`

Expand Down Expand Up @@ -35,16 +35,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
26. Now using TSEBRA to purge models which do not have full intron support from BRAKER hints
27. Added params `eggnogmapper_evalue` and `eggnogmapper_pident`
28. Added `PURGE_NOHIT_BRAKER_MODELS` sub-workflow
29. Removed liftoff models with `valid_ORF=False`
30. Now merging BRAKER and liftoff models before running eggnogmapper
31. Added `GFF_MERGE_CLEANUP` sub-workflow
32. Now using `description` field to store notes and textual annotations in the gff files
33. Now using `mRNA` in place of `transcript` in gff files
34. Now `eggnogmapper_purge_nohits` is set to `false` by default
35. Added `GFF_STORE` sub workflow
29. Now merging BRAKER and liftoff models before running eggnogmapper
30. Added `GFF_MERGE_CLEANUP` sub-workflow
31. Now using `description` field to store notes and textual annotations in the gff files
32. Now using `mRNA` in place of `transcript` in gff files
33. Now `eggnogmapper_purge_nohits` is set to `false` by default
34. Added `GFF_STORE` sub workflow
35. `external_protein_fastas` and `eggnogmapper_db_dir` are not mandatory parameters
36. Added contributors
37. Add a document for the pipeline parameters
38. Updated `pfr_pangene` and `pfr/profile.config`
39. Now using local tests/stub files for GitHub CI

### `Fixed`

1. Removed liftoff models with `valid_ORF=False`
2. Updated license text to include 'Copyright (c) 2024 The New Zealand Institute for Plant and Food Research Limited'

### `Dependencies`

1. NextFlow!>=23.04.4
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) Usman Rashid, Jason Shiller
Copyright (c) 2024 The New Zealand Institute for Plant and Food Research Limited

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
58 changes: 46 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Lint and -stub on Linux/Docker](https://github.com/PlantandFoodResearch/pangene/actions/workflows/test.yml/badge.svg)](https://github.com/PlantandFoodResearch/pangene/actions/workflows/test.yml)

A NextFlow pipeline for pan-genome annotation.
A NextFlow pipeline for pan-genome annotation. It can also be used for annotation of a single genome.

## Pipeline Flowchart

Expand Down Expand Up @@ -84,26 +84,60 @@ flowchart TD
style Annotation fill:#00FFFF21,stroke:#00FFFF21
```

## Alpha Release

This release is not fully documented and under alpha testing by the Bioinformatics Team. There are several [outstanding issues](https://github.com/PlantandFoodResearch/pangene/issues) which will be addressed before a general release.

## Plant&Food Users

Configure the pipeline by modifying `nextflow.config` and submit to SLURM for execution.
Download the pipeline to your `/workspace/$USER` folder. Change the parameters defined in the [pfr/params.json](./pfr/params.json) file. Submit the pipeline to SLURM for execution. For a description of the parameters, see [parameters](./docs/parameters.md).

```bash
sbatch ./pangene_pfr
sbatch ./pfr_pangene
```

## Third-party Sources

Some software components of this pipeline have been adopted from following third-party sources:

1. nf-core [MIT](https://github.com/nf-core/modules/blob/master/LICENSE): https://github.com/nf-core/modules
## Credits

plantandfoodresearch/pangene scripts were originally written by Jason Shiller. Usman Rashid wrote the NextFLow pipeline.

We thank the following people for their extensive assistance in the development of this pipeline.

- Cecilia Deng [@CeciliaDeng](https://github.com/CeciliaDeng)
- Charles David [@charlesdavid](https://github.com/charlesdavid)
- Chen Wu [@christinawu2008](https://github.com/christinawu2008)
- Leonardo Salgado [@leorippel](https://github.com/leorippel)
- Ross Crowhurst [@rosscrowhurst](https://github.com/rosscrowhurst)
- Susan Thomson [@cflsjt](https://github.com/cflsjt)
- Ting-Hsuan Chen [@ting-hsuan-chen](https://github.com/ting-hsuan-chen)

The pipeline uses nf-core modules contributed by following authors.

<a href="https://github.com/drpatelh"><img src="https://github.com/drpatelh.png" width="50" height="50"></a>
<a href="https://github.com/edmundmiller"><img src="https://github.com/edmundmiller.png" width="50" height="50"></a>
<a href="https://github.com/erikrikarddaniel"><img src="https://github.com/erikrikarddaniel.png" width="50" height="50"></a>
<a href="https://github.com/ewels"><img src="https://github.com/ewels.png" width="50" height="50"></a>
<a href="https://github.com/felixkrueger"><img src="https://github.com/felixkrueger.png" width="50" height="50"></a>
<a href="https://github.com/friederikehanssen"><img src="https://github.com/friederikehanssen.png" width="50" height="50"></a>
<a href="https://github.com/gallvp"><img src="https://github.com/gallvp.png" width="50" height="50"></a>
<a href="https://github.com/grst"><img src="https://github.com/grst.png" width="50" height="50"></a>
<a href="https://github.com/jemten"><img src="https://github.com/jemten.png" width="50" height="50"></a>
<a href="https://github.com/jfy133"><img src="https://github.com/jfy133.png" width="50" height="50"></a>
<a href="https://github.com/joseespinosa"><img src="https://github.com/joseespinosa.png" width="50" height="50"></a>
<a href="https://github.com/kevinmenden"><img src="https://github.com/kevinmenden.png" width="50" height="50"></a>
<a href="https://github.com/kherronism"><img src="https://github.com/kherronism.png" width="50" height="50"></a>
<a href="https://github.com/mashehu"><img src="https://github.com/mashehu.png" width="50" height="50"></a>
<a href="https://github.com/matthdsm"><img src="https://github.com/matthdsm.png" width="50" height="50"></a>
<a href="https://github.com/praveenraj2018"><img src="https://github.com/praveenraj2018.png" width="50" height="50"></a>
<a href="https://github.com/robsyme"><img src="https://github.com/robsyme.png" width="50" height="50"></a>
<a href="https://github.com/toniher"><img src="https://github.com/toniher.png" width="50" height="50"></a>
<a href="https://github.com/vagkaratzas"><img src="https://github.com/vagkaratzas.png" width="50" height="50"></a>

## Citations

This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).

> **The nf-core framework for community-curated bioinformatics pipelines.**
>
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).
2. nf-core/rnaseq [MIT](https://github.com/nf-core/rnaseq/blob/master/LICENSE): https://github.com/nf-core/rnaseq
3. rewarewaannotation [MIT](https://github.com/kherronism/rewarewaannotation/blob/master/LICENSE): https://github.com/kherronism/rewarewaannotation
4. assembly_qc [GPL-3.0](https://github.com/Plant-Food-Research-Open/assembly_qc/blob/main/LICENSE): https://github.com/Plant-Food-Research-Open/assembly_qc
11 changes: 0 additions & 11 deletions conf/test_params.json

This file was deleted.

10 changes: 0 additions & 10 deletions conf/test_stub.config

This file was deleted.

5 changes: 5 additions & 0 deletions docs/contributors.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env bash

module_authors=$(find ./modules -name meta.yml | xargs -I {} grep -A20 'authors:' {} | grep '\- ' | tr -d '[-" ]' | tr '[:upper:]' '[:lower:]')
workflow_authors=$(find ./subworkflows -name meta.yml | xargs -I {} grep -A20 'authors:' {} | grep '\- ' | tr -d '[-" ]' | tr '[:upper:]' '[:lower:]')
echo -e "${module_authors}\n${workflow_authors}" | sort -V | uniq | sed -n 's|@\(.*\)|<a href="https://github.com/\1"><img src="https://github.com/\1.png" width="50" height="50"></a>|p'
68 changes: 68 additions & 0 deletions docs/parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# plantandfoodresearch/pangene pipeline parameters

A NextFlow pipeline for pan-genome annotation

## Input/output options

| Parameter | Description | Type | Default | Required | Hidden |
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | --------- | -------- | ------ |
| `input` | Target assemblies listed in a CSV sheet <details><summary>Help</summary><small>FASTA and other associated files for target assemblies provided as a CSV sheet</small></details> | `string` | | True | |
| `external_protein_fastas` | External protein fastas listed in a text sheet <details><summary>Help</summary><small>A text file listing FASTA files to provide protein evidence for annotation</small></details> | `string` | | True | |
| `eggnogmapper_db_dir` | Eggnogmapper database directory | `string` | | True | |
| `eggnogmapper_tax_scope` | Eggnogmapper taxonomy scopre | `integer` | | True | |
| `fastq` | FASTQ samples listed in a CSV sheet <details><summary>Help</summary><small>FASTQ files for RNASeq samples corresponding to each target assembly provided in a CSV sheet</small></details> | `string` | | | |
| `liftoff_annotations` | Reference annotations listed in a CSV sheet <details><summary>Help</summary><small>FASTA and GFF3 files for reference annotations for liftoff listed in a CSV sheet</small></details> | `string` | | | |
| `outdir` | The output directory where the results will be saved <details><summary>Help</summary><small> Use absolute paths to storage on Cloud infrastructure</small></details> | `string` | ./results | True | |

## Repeat annotation options

| Parameter | Description | Type | Default | Required | Hidden |
| --------------------------- | ------------------------------------------ | --------- | ------------- | -------- | ------ |
| `repeat_annotator` | 'edta' or 'repeatmodeler' | `string` | repeatmodeler | | |
| `save_annotated_te_lib` | Save annotated TE library or not? | `boolean` | | | |
| `edta_is_sensitive` | Use '--sensitive 1' flag with EDTA or not? | `boolean` | | | |
| `repeatmasker_save_outputs` | Save the repeat-masked genome or not? | `boolean` | | | |

## RNASeq pre-processing options

| Parameter | Description | Type | Default | Required | Hidden |
| ------------------------ | ------------------------------------------------------------------ | --------- | ----------------------------------------- | -------- | ------ |
| `skip_fastqc` | Skip FASTQC or not? | `boolean` | | | |
| `skip_fastp` | Skip trimming by FASTQP or not? | `boolean` | | | |
| `min_trimmed_reads` | Exclude a sample if its reads after trimming are below this number | `integer` | 10000 | | |
| `extra_fastp_args` | Extra FASTP arguments | `string` | | | |
| `save_trimmed` | Save FASTQ files after trimming or not? | `boolean` | | | |
| `remove_ribo_rna` | Remove Ribosomal RNA or not? | `boolean` | | | |
| `save_non_ribo_reads` | Save FASTQ files after Ribosomal RNA removal or not? | `boolean` | | | |
| `ribo_database_manifest` | Ribosomal RNA fastas listed in a text sheet | `string` | ${projectDir}/assets/rrna-db-defaults.txt | | |

## RNAseq alignment options

| Parameter | Description | Type | Default | Required | Hidden |
| ------------------------ | ------------------------------------------------- | --------- | ------- | -------- | ------ |
| `star_max_intron_length` | Maximum intron length for STAR alignment | `integer` | 16000 | | |
| `star_align_extra_args` | EXTRA arguments for STAR | `string` | | | |
| `star_save_outputs` | Save BAM files from STAR or not? | `boolean` | | | |
| `save_cat_bam` | SAVE a concatenated BAM file per assembly or not? | `boolean` | | | |

## Annotation options

| Parameter | Description | Type | Default | Required | Hidden |
| --------------------------- | --------------------------------------------------------------------------------- | --------- | ------- | -------- | ------ |
| `braker_extra_args` | Extra arguments for BRAKER | `string` | | | |
| `braker_allow_isoforms` | Allow multiple isoforms for gene models | `boolean` | True | | |
| `liftoff_coverage` | Liftoff coverage parameter | `number` | 0.9 | | |
| `liftoff_identity` | Liftoff identity parameter | `number` | 0.9 | | |
| `eggnogmapper_evalue` | Only report alignments below or equal the e-value threshold | `number` | 1e-05 | | |
| `eggnogmapper_pident` | Only report alignments above or equal to the given percentage of identity (0-100) | `integer` | 35 | | |
| `eggnogmapper_purge_nohits` | Purge transcripts which do not have a hit against eggnog | `boolean` | | | |

## Max job request options

Set the top limit for requested resources for any single job.

| Parameter | Description | Type | Default | Required | Hidden |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------- | -------- | ------ |
| `max_cpus` | Maximum number of CPUs that can be requested for any single job. <details><summary>Help</summary><small>Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`</small></details> | `integer` | 12 | | True |
| `max_memory` | Maximum amount of memory that can be requested for any single job. <details><summary>Help</summary><small>Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`</small></details> | `string` | 200.GB | | True |
| `max_time` | Maximum amount of time that can be requested for any single job. <details><summary>Help</summary><small>Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`</small></details> | `string` | 7.day | | True |
4 changes: 2 additions & 2 deletions local_pangene
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ F_BOLD="\033[1m"
&& echo 'Executing with -stub' \
|| echo -e "${C_RED}${F_BOLD}Executing without -stub${NO_FORMAT}"

nextflow \
nextflow run \
main.nf \
-profile local,docker \
-resume \
$stub \
-params-file conf/test_params.json \
-params-file pangene-test/test_params.json \
--eggnogmapper_db_dir ../../dbs/emapperdb/5.0.2 \
--eggnogmapper_tax_scope 33090
3 changes: 3 additions & 0 deletions modules/kherronism/braker3/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,11 @@ process BRAKER3 {
"""

stub:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
def rna_ids = rnaseq_sets_ids ? "--rnaseq_sets_ids=${rnaseq_sets_ids}" : ''
def touch_hints = (rna_ids || bam || proteins || hints) ? "touch ${prefix}/hintsfile.gff" : ''
def touch_gff = args.contains('--gff3') ? "touch ${prefix}/braker.gff3" : ''
"""
mkdir "$prefix"
Expand All @@ -74,6 +76,7 @@ process BRAKER3 {
$touch_hints
touch "${prefix}/braker.log"
touch "${prefix}/what-to-cite.txt"
$touch_gff
cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
Loading

0 comments on commit ae0e6d2

Please sign in to comment.