Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release candidate for 0.6.0 #129

Merged
merged 42 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
c8667c9
Add EXTRACT_CDS feature to GFF_STORE workflow
liamlelievre Dec 4, 2024
dbebd7a
Update gff_store.nf
liamlelievre Dec 4, 2024
203af9b
Update gff_store.nf
liamlelievre Dec 4, 2024
13fe4e9
add EXTRACT_CDNA to gff_store.nf
liamlelievre Dec 4, 2024
c7bf40b
add GFF_STORE:EXTRACT_CDNA to modules.config
liamlelievre Dec 4, 2024
f0699a6
Update gff_store.nf
liamlelievre Dec 4, 2024
563bd13
Add cdna and cds outputs to output.md
liamlelievre Dec 4, 2024
79befa5
Add notes about cdna and cds update to CHANGELOG.md
liamlelievre Dec 4, 2024
47e2cf3
Added liamlelievre to contributors - README.md
liamlelievre Dec 4, 2024
2a118be
Update output.md
liamlelievre Dec 4, 2024
216225c
Added v0.6.0 notes to CHANGELOG.md
liamlelievre Dec 4, 2024
74cd2b2
removed trailing whitespace gff_store.nf
liamlelievre Dec 4, 2024
81871ff
Removed trailing whitespace - modules.config
liamlelievre Dec 4, 2024
3f898b0
rename params - modules.config
liamlelievre Dec 4, 2024
5af18c8
Rename params - nextflow.config
liamlelievre Dec 4, 2024
841ea02
Added code contributors
GallVp Dec 4, 2024
e05a469
Run nf-test successfully in minimal and stub
Dec 5, 2024
d2ff47e
Run nf-test successfully in minimal and stub, renamed attr, updated docs
Dec 5, 2024
2bcab04
Merge branch 'main' into add-gffread-feature
Dec 5, 2024
d21d70e
Add attributes option for -F -D to cds and cdna
Dec 5, 2024
9be84b2
Fixed linting issues
GallVp Dec 5, 2024
767239a
Updated snapshot
GallVp Dec 5, 2024
e042615
Fixed nextflow-setup version
GallVp Dec 5, 2024
43742fb
Fixed indent
GallVp Dec 5, 2024
17673f7
Merge pull request #119 from liamlelievre/add-gffread-feature
GallVp Dec 5, 2024
fb9a0f4
Fixed an issue where TSEBRA failed because LIFTOFF lifted non-protein…
GallVp Dec 4, 2024
87cf1dd
Updated snapshots
GallVp Dec 5, 2024
7d1f318
Updated changelog
GallVp Dec 5, 2024
e1cceba
Merge pull request #122 from Plant-Food-Research-Open/patch/121
GallVp Dec 5, 2024
b9b7d37
Fixed issues in genepal-report
GallVp Dec 10, 2024
496ff62
Merge pull request #126 from Plant-Food-Research-Open/patch/124
GallVp Dec 10, 2024
3114adb
Added parameter filter_genes_by_aa_length
GallVp Dec 10, 2024
d694431
Updated snapshots
GallVp Dec 10, 2024
0f7784c
Added test to verify that GFFREAD can filter mRNA by CDS length
GallVp Dec 11, 2024
f980772
Updated snapshots
GallVp Dec 11, 2024
ab3ae37
Updated README and snapshot
GallVp Dec 11, 2024
c65ebaa
Added 1 to filter_genes_by_aa_length to exclude stop codon from filte…
GallVp Dec 15, 2024
9759882
Merge pull request #127 from Plant-Food-Research-Open/feat/filter_gen…
GallVp Dec 15, 2024
fa28176
Fixed post-liftoff merge
GallVp Dec 16, 2024
2a0c7a2
Merge pull request #130 from Plant-Food-Research-Open/fix/mergefail
GallVp Dec 16, 2024
3459b40
Fixed short intron crash
GallVp Dec 19, 2024
d069633
Merge pull request #131 from Plant-Food-Research-Open/fix/short_intron
GallVp Dec 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions .github/workflows/branch.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
name: nf-core branch protection
# This workflow is triggered on PRs to master branch on the repository
# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev`
# This workflow is triggered on PRs to main branch on the repository
# It fails when someone tries to make a PR against the Plant-Food-Research-Open `main` branch instead of `dev`
on:
pull_request_target:
branches: [master]
branches: [main]

jobs:
test:
runs-on: ubuntu-latest
steps:
# PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
# PRs to the nf-core repo main branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
- name: Check PRs
if: github.repository == 'Plant-Food-Research-Open/genepal'
run: |
Expand All @@ -22,7 +22,7 @@ jobs:
uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2
with:
message: |
## This PR is against the `master` branch :x:
## This PR is against the `main` branch :x:

* Do not close this PR
* Click _Edit_ and change the `base` to `dev`
Expand All @@ -32,9 +32,9 @@ jobs:

Hi @${{ github.event.pull_request.user.login }},

It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
The `master` branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `main` branch.
The `main` branch should always contain code from the latest release.
Because of this, PRs to `main` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.

You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
uses: actions/[email protected]

- name: Install Nextflow
uses: nf-core/setup-nextflow@v2
uses: nf-core/setup-nextflow@v2.0.0
with:
version: "${{ matrix.NXF_VER }}"

Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/download_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Test successful pipeline download with 'nf-core pipelines download'

# Run the workflow when:
# - dispatched manually
# - when a PR is opened or reopened to master branch
# - when a PR is opened or reopened to main branch
# - the head branch of the pull request is updated, i.e. if fixes for a release are pushed last minute to dev.
on:
workflow_dispatch:
Expand All @@ -17,10 +17,10 @@ on:
- edited
- synchronize
branches:
- master
- main
pull_request_target:
branches:
- master
- main

env:
NXF_ANSI_LOG: false
Expand All @@ -30,7 +30,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Install Nextflow
uses: nf-core/setup-nextflow@v2
uses: nf-core/setup-nextflow@v2.0.0

- name: Disk space cleanup
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4

- name: Install Nextflow
uses: nf-core/setup-nextflow@v2
uses: nf-core/setup-nextflow@v2.0.0

- uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
Expand Down
2 changes: 1 addition & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,5 @@ template:
outdir: .
skip_features:
- igenomes
version: 0.5.0
version: 0.6.0
update: null
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,32 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v0.6.0 - [20-Dec-2024]

### 'Added'

1. Added cDNA and CDS outputs to <OUTPUT_DIR>/annotations/<SAMPLE> directory [#118](https://github.com/Plant-Food-Research-Open/genepal/issues/118)
2. Added parameter `add_attrs_to_proteins_cds_fastas`
3. Added parameter `filter_genes_by_aa_length` with default set to `24` which allows removal of genes with ORFs shorter than 24 [#125](https://github.com/Plant-Food-Research-Open/genepal/issues/125)

### `Fixed`

1. Fixed an issue where TSEBRA failed because LIFTOFF lifted non-protein coding genes [#121](https://github.com/Plant-Food-Research-Open/genepal/issues/121)
2. Switched branch name from `master` to `main` in the GHA CIs
3. Fixed an issue in `genepal_report.Rmd` which caused the pangene matrix plot to fail when the number of clusters exceeded 65536 [#124](https://github.com/Plant-Food-Research-Open/genepal/issues/124)
4. Fixed an issue where `GENEPALREPORT` process failed due to OOM kill signal from SLURM [#123](https://github.com/Plant-Food-Research-Open/genepal/issues/123)
5. Fixed an issue where Gff merge after liftoff failed when one of the Gff files did not contain any genes
6. Fixed an issue where `gxf_fasta_agat_spaddintrons_spextractsequences` crashed due to short introns [#89](https://github.com/Plant-Food-Research-Open/genepal/issues/89)

### `Dependencies`

1. Nextflow!>=24.04.2
2. [email protected]

### `Deprecated`

1. Removed parameter `add_attrs_to_proteins_fasta`

## v0.5.0 - [21-Nov-2024]

### `Added`
Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ authors:
- family-names: "Thomson"
given-names: "Susan"
title: "genepal: A Nextflow pipeline for genome and pan-genome annotation"
version: 0.5.0
version: 0.6.0
date-released: 2024-11-21
url: "https://github.com/Plant-Food-Research-Open/genepal"
doi: 10.5281/zenodo.14195006
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,16 @@
- Merge multi-reference liftoffs
- Remove liftoff transcripts marked by _valid_ORF=False_
- Remove liftoff genes with any intron shorter than 10 bp
- Remove rRNA and tRNA from liftoff
- Remove rRNA, tRNA and other non-protein coding models from liftoff
- Optionally, allow or remove iso-forms
- Remove BRAKER models from Liftoff loci
- Merge Liftoff and BRAKER models
- Optionally, remove models without any EggNOG-mapper hits
- [EggNOG-mapper](https://github.com/eggnogdb/eggnog-mapper): Add functional annotation to gff
- [GenomeTools](https://github.com/genometools/genometools): GFF format validation
- [GffRead](https://github.com/gpertea/gffread): Extraction of protein sequences
- [GffRead](https://github.com/gpertea/gffread)
- Extraction of protein sequences
- Optionally, remove models with ORFs shorter than `N` amino acids
- [OrthoFinder](https://github.com/davidemms/OrthoFinder): Perform phylogenetic orthology inference across genomes
- [GffCompare](https://github.com/gpertea/gffcompare): Compare and benchmark against an existing annotation
- [BUSCO](https://gitlab.com/ezlab/busco): Completeness statistics for genome and annotation through proteins
Expand Down Expand Up @@ -97,7 +99,7 @@ sbatch ./pfr_genepal

plant-food-research-open/genepal workflows were originally scripted by Jason Shiller ([@jasonshiller](https://github.com/jasonshiller)). Usman Rashid ([@gallvp](https://github.com/gallvp)) wrote the Nextflow pipeline.

We thank the following people for their extensive assistance in the development of this pipeline:
We thank the following people for extensive assistance in the development of the pipeline,

- Cecilia Deng [@CeciliaDeng](https://github.com/CeciliaDeng)
- Charles David [@charlesdavid](https://github.com/charlesdavid)
Expand All @@ -107,6 +109,10 @@ We thank the following people for their extensive assistance in the development
- Susan Thomson [@cflsjt](https://github.com/cflsjt)
- Ting-Hsuan Chen [@ting-hsuan-chen](https://github.com/ting-hsuan-chen)

and for contributions to the codebase,

- Liam Le Lievre [@liamlelievre](https://github.com/liamlelievre)

The pipeline uses nf-core modules contributed by following authors:

<a href="https://github.com/gallvp"><img src="https://github.com/gallvp.png" width="50" height="50"></a>
Expand Down Expand Up @@ -139,6 +145,7 @@ The pipeline uses nf-core modules contributed by following authors:
<a href="https://github.com/charles-plessy"><img src="https://github.com/charles-plessy.png" width="50" height="50"></a>
<a href="https://github.com/bunop"><img src="https://github.com/bunop.png" width="50" height="50"></a>
<a href="https://github.com/abhi18av"><img src="https://github.com/abhi18av.png" width="50" height="50"></a>
<a href="https://github.com/liamlelievre"><img src="https://github.com/liamlelievre.png" width="50" height="50"></a>

## Contributions and Support

Expand Down
2 changes: 1 addition & 1 deletion assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
report_comment: >
This report has been generated by the <a href="https://github.com/plant-food-research-open/genepal" target="_blank">plant-food-research-open/genepal</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://github.com/plant-food-research-open/genepal/blob/0.5.0/docs/usage.md" target="_blank">documentation</a>.
<a href="https://github.com/plant-food-research-open/genepal/blob/0.6.0/docs/usage.md" target="_blank">documentation</a>.

report_section_order:
"plant-food-research-open-genepal-methods-description":
Expand Down
29 changes: 25 additions & 4 deletions bin/genepal_report.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -190,22 +190,43 @@ cat("<br>")


```{r pheatmap, eval=(exists("n0_df") && !is.null(n0_df$heatmap)), results='hide', fig.align='center', fig.cap="Heatmap showing number of proteins present in each orthocluster (clusters where all individuals have 1 copy are excluded). Columns = Orthologue cluster, Row = Individual", fig.width=7, fig.height=7, dpi=150, warning=FALSE}
pheatmap(n0_df$heatmap,

# Max 65536 allowed
# https://github.com/Plant-Food-Research-Open/genepal/issues/124

n_cols <- ncol(n0_df$heatmap)
max_cols_allowed <- min(n_cols, 5000)

# Approach 1: Random selection of columns
# selected_cols <- sample(n_cols, max_cols_allowed)

# Approach 2: First N largest clusters
selected_cols <- order(colSums(n0_df$heatmap), decreasing = TRUE)[seq(1, max_cols_allowed)]

prefix_text <- ""

if ( n_cols != max_cols_allowed ) {
prefix_text <- paste0("Top ", max_cols_allowed, " ")
}

pheatmap(n0_df$heatmap[, selected_cols],
show_colnames = FALSE,
main = "Orthologue clusters containing accessory proteins",
main = paste0(prefix_text, "Orthologue clusters"),
legend = TRUE,
legend_labels = TRUE,
border_color = "white"
)

pheatmap(n0_df$heatmap,
pheatmap(n0_df$heatmap[, selected_cols],
filename = file.path(outputs_folder, "pangene.matrix.heatmap.pdf"),
show_colnames = FALSE,
main = "Orthologue clusters containing accessory proteins",
main = paste0(prefix_text, "Orthologue clusters"),
legend = TRUE,
legend_labels = TRUE,
border_color = "white"
)

write.csv(x = transform_hogs(n0o), file = file.path(outputs_folder, "pangenome.matrix.csv"), row.names = FALSE)
```


Expand Down
3 changes: 3 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,7 @@ process {
cpus = { 8 * task.attempt }
time = { 7.days * task.attempt }
}
withName:GENEPALREPORT {
memory = { 20.GB * task.attempt }
}
}
31 changes: 28 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ process { // SUBWORKFLOW: FASTA_LIFTOFF
}

withName: '.*:FASTA_LIFTOFF:GFFREAD_BEFORE_LIFTOFF' {
ext.args = '--no-pseudo --keep-genes'
ext.args = '--no-pseudo --keep-genes -C'
}

withName: '.*:FASTA_LIFTOFF:MERGE_LIFTOFF_ANNOTATIONS' {
Expand All @@ -212,7 +212,7 @@ process { // SUBWORKFLOW: FASTA_LIFTOFF

withName: '.*:FASTA_LIFTOFF:GFFREAD_AFTER_LIFTOFF' {
ext.prefix = { "${meta.id}.liftoff" }
ext.args = '--keep-genes'
ext.args = '--no-pseudo --keep-genes -C'
}

withName: '.*:FASTA_LIFTOFF:GFF_TSEBRA_SPFILTERFEATUREFROMKILLLIST:AGAT_CONVERTSPGFF2GTF' {
Expand Down Expand Up @@ -240,6 +240,10 @@ process { // SUBWORKFLOW: GFF_MERGE_CLEANUP
ext.prefix = { "${meta.id}.liftoff.braker" }
}

withName: '.*:GFF_MERGE_CLEANUP:FILTER_BY_ORF_SIZE' {
ext.args = params.filter_genes_by_aa_length ? "--no-pseudo --keep-genes -C -l ${ ( params.filter_genes_by_aa_length + 1 ) * 3 }" : ''
}

withName: '.*:GFF_MERGE_CLEANUP:GT_GFF3' {
ext.args = '-tidy -retainids -sort'
}
Expand Down Expand Up @@ -286,7 +290,7 @@ process { // SUBWORKFLOW: GFF_STORE
}

withName: '.*:GFF_STORE:EXTRACT_PROTEINS' {
ext.args = params.add_attrs_to_proteins_fasta ? '-F -D -y' : '-y'
ext.args = params.add_attrs_to_proteins_cds_fastas ? '-F -D -y' : '-y'
ext.prefix = { "${meta.id}.pep" }

publishDir = [
Expand All @@ -295,6 +299,27 @@ process { // SUBWORKFLOW: GFF_STORE
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: '.*:GFF_STORE:EXTRACT_CDS' {
ext.args = params.add_attrs_to_proteins_cds_fastas ? '-F -D -x' : '-x'
ext.prefix = { "${meta.id}.cds" }

publishDir = [
path: { "${params.outdir}/annotations/$meta.id" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}
withName: '.*:GFF_STORE:EXTRACT_CDNA' {
ext.args = params.add_attrs_to_proteins_cds_fastas ? '-F -D -w' : '-w'
ext.prefix = { "${meta.id}.cdna" }

publishDir = [
path: { "${params.outdir}/annotations/$meta.id" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}
}

process { // SUBWORKFLOW: FASTA_ORTHOFINDER
Expand Down
2 changes: 2 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,8 @@ If more than one genome is included in the pipeline, [ORTHOFINDER](https://githu
- `Y/`
- `Y.gt.gff3`: Final annotation file for genome `Y` which contains gene models and their functional annotations
- `Y.pep.fasta`: Protein sequences for the gene models
- `Y.cdna.fasta`: cDNA sequences for the gene models
- `Y.cds.fasta`: Coding sequences for the gene models

</details>

Expand Down
21 changes: 11 additions & 10 deletions docs/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,19 +59,20 @@ A Nextflow pipeline for consensus, phased and pan-genome annotation.

## Post-annotation filtering options

| Parameter | Description | Type | Default | Required | Hidden |
| ----------------------------- | ----------------------------------------------------------------- | --------- | ------- | -------- | ------ |
| `allow_isoforms` | Allow multiple isoforms for gene models | `boolean` | True | | |
| `enforce_full_intron_support` | Require every model to have external evidence for all its introns | `boolean` | True | | |
| `filter_liftoff_by_hints` | Use BRAKER hints to filter Liftoff models | `boolean` | True | | |
| `eggnogmapper_purge_nohits` | Purge transcripts which do not have a hit against eggnog | `boolean` | | | |
| Parameter | Description | Type | Default | Required | Hidden |
| ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------- | -------- | ------ |
| `allow_isoforms` | Allow multiple isoforms for gene models | `boolean` | True | | |
| `enforce_full_intron_support` | Require every model to have external evidence for all its introns | `boolean` | True | | |
| `filter_liftoff_by_hints` | Use BRAKER hints to filter Liftoff models | `boolean` | True | | |
| `eggnogmapper_purge_nohits` | Purge transcripts which do not have a hit against eggnog | `boolean` | | | |
| `filter_genes_by_aa_length` | Filter genes with open reading frames shorter than the specified number of amino acids excluding the stop codon. If set to `null`, this filter step is skipped. | `integer` | 24 | | |

## Annotation output options

| Parameter | Description | Type | Default | Required | Hidden |
| ----------------------------- | ------------------------------------ | --------- | ------- | -------- | ------ |
| `braker_save_outputs` | Save BRAKER files | `boolean` | | | |
| `add_attrs_to_proteins_fasta` | Add gff attributes to proteins fasta | `boolean` | | | |
| Parameter | Description | Type | Default | Required | Hidden |
| ---------------------------------- | --------------------------------------------- | --------- | ------- | -------- | ------ |
| `braker_save_outputs` | Save BRAKER files | `boolean` | | | |
| `add_attrs_to_proteins_cds_fastas` | Add gff attributes to proteins/cDNA/CDS fasta | `boolean` | | | |

## Evaluation options

Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@
},
"gxf_fasta_agat_spaddintrons_spextractsequences": {
"branch": "main",
"git_sha": "7bf6fbca23edc94490ffa6709f52b2f71c6fb130",
"git_sha": "ed4146008dbdcfd4823252b456de32059e2d07f4",
"installed_by": ["subworkflows"]
}
}
Expand Down
Loading
Loading