genome requirement with --pseudo_aligner salmon and --skip_alignment #688

didillysquat · 2021-08-06T09:37:34Z

Check Documentation

I have checked the following places for your error:
I have checked both of these and looked through the introduction to see which steps might require the genome.

Description of the bug

When running the pipeline with --pseudo_aligner salmon --skip_alignment and providing a valid --transcript_fasta and --salmon_index but not providing --fasta or --genome, the pipeline will not run requesting that I provide a genome file: Genome fasta file not specified with e.g. '--fasta genome.fa' or via a detectable config file.

Steps to reproduce

Steps to reproduce the behaviour:

Command line: nextflow run nf-core/rnaseq --input woltering_samplesheet.csv --pseudo_aligner salmon --skip_alignment --transcript_fasta ../athal_transcriptome/Arabidopsis_thaliana.TAIR10.cdna.all.fa.gz --salmon_index ../athal_transcriptome/Arabidopsis_thaliana.TAIR10.cdna.all.fa.gz.index -profile docker
See error: Genome fasta file not specified with e.g. '--fasta genome.fa' or via a detectable config file.

Expected behaviour

I would expect this specific route of the pipeline to be able to run without access to a genome, as running quantification with Salmon on the command line I need only provide the transcript fasta and the index.
I've asked to skip allignments (that would otherwise require the genome), but which other step in the pipeline is the genome required for?

I would hope that the pipeline could run without access to the genome.

Log files

nextflow.log

Have you provided the following extra information/files:

The command used to run the pipeline
The .nextflow.log file

System

Hardware: remote server
Executor: run on command line
OS: Ubuntu
Version 20.04.2 LTS

Nextflow Installation

version 21.04.1

Container engine

Engine: Docker
version: 20.10.5

Additional context

The text was updated successfully, but these errors were encountered:

didillysquat · 2021-08-10T09:48:34Z

I see now that it is required for the DESeq2 QC that is performed downstream of the salmon pseudo quantification.

drpatelh · 2021-08-10T10:07:02Z

Hi @didillysquat ! Apologies for the late response. I am holiday at the mo.

It's actually required to build the decoy sequences for the Salmon index. If you have a genome fasta available I believe it's advisable to build the index with both the genome fasta and transcriptome fasta. I discussed this with @rob-p whilst adding Salmon support here.

Maybe we should also add support for instances where the genome fasta isn't available though as this issue highlights that particular edge case.

didillysquat · 2021-08-10T10:19:48Z

Hi @drpatelh,

There is no hurry on this at all so please don't disrupt your holidays on my behalf.

For my particular case I'm using your wonderful pipeline as a quick but clean way to get a set of salmon pseudo quantification files from RNA-seq reads that I can then import into DESeq2.

I'm sure you're far more knowledgable about this than I am but I was simply following the guidance of the salmon tutorial which worked with only an indexed transcriptome fasta (i.e. no genome). For this particular use case, it could perhaps be useful for the pipeline to detect that neither --genome nor --fasta have been provided and so limit the output accordingly (i.e. no DESeq QC) but provide a warning saying that it is doing so. (I.e. it could say "no genome provided so skipping XXX").

Having said that, one extremely useful output from your pipeline (after running it providing the --genome information) is the txt2gene.txt file (called 'salmon_txt2gene.txt' in your pipeline) that maps the transcript IDs to the genes and allows the import of the salmon counts to DESeq2 using tximport. If appropriate, it could be useful to provide this in the main salmon output directory.

Thanks for your continued efforts!

drpatelh · 2021-10-04T15:23:55Z

Hi @didillysquat ! I was going to have a go at adding this feature for the 3.4 release but it will take quite a bit of refactoring so maybe we can it in 3.5.

I have, however added the functionality for the pipeline to be able to publish the salmon_tx2gene.txt files in the salmon counts directory here.

didillysquat · 2021-10-05T07:23:07Z

@drpatelh Super! Many thanks for that.

pinin4fjords · 2025-01-22T14:21:48Z

I'm finally addressing this in #1490.

But I've also noted that Salmon indices should generally be build with genomic FASTA decoys, so this isn't actually recommended unless you're sure you know what you're doing.

pinin4fjords · 2025-01-22T16:48:28Z

Addressed by #1490

didillysquat added the bug Something isn't working label Aug 6, 2021

didillysquat closed this as completed Aug 10, 2021

drpatelh reopened this Aug 10, 2021

drpatelh added this to the 3.4 milestone Sep 22, 2021

drpatelh modified the milestones: 3.4, 3.5 Oct 4, 2021

drpatelh added enhancement and removed bug Something isn't working labels Oct 4, 2021

drpatelh modified the milestones: 3.5, 3.6 Dec 13, 2021

drpatelh modified the milestones: 3.6, 3.7 Feb 20, 2022

drpatelh modified the milestones: 3.7, 3.8 Apr 26, 2022

drpatelh modified the milestones: 3.9, 3.10 Sep 25, 2022

drpatelh modified the milestones: 3.10, 3.11 Dec 16, 2022

drpatelh self-assigned this May 7, 2023

drpatelh modified the milestones: 3.12, 3.13 Jun 2, 2023

drpatelh modified the milestones: 3.15.0, 3.16.0 May 29, 2024

pinin4fjords mentioned this issue Aug 20, 2024

Unable to go from FASTQ to Salmon Quantification #1333

Closed

pinin4fjords mentioned this issue Jan 17, 2025

Allow transcriptome-only salmon indexing nf-core/modules#7327

Merged

17 tasks

pinin4fjords mentioned this issue Jan 21, 2025

Make genomic FASTA input optional #1490

Merged

11 tasks

pinin4fjords closed this as completed Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

genome requirement with --pseudo_aligner salmon and --skip_alignment #688

genome requirement with --pseudo_aligner salmon and --skip_alignment #688

didillysquat commented Aug 6, 2021

didillysquat commented Aug 10, 2021

drpatelh commented Aug 10, 2021

didillysquat commented Aug 10, 2021

drpatelh commented Oct 4, 2021

didillysquat commented Oct 5, 2021

pinin4fjords commented Jan 22, 2025

pinin4fjords commented Jan 22, 2025

genome requirement with --pseudo_aligner salmon and --skip_alignment #688

genome requirement with --pseudo_aligner salmon and --skip_alignment #688

Comments

didillysquat commented Aug 6, 2021

Check Documentation

Description of the bug

Steps to reproduce

Expected behaviour

Log files

System

Nextflow Installation

Container engine

Additional context

didillysquat commented Aug 10, 2021

drpatelh commented Aug 10, 2021

didillysquat commented Aug 10, 2021

drpatelh commented Oct 4, 2021

didillysquat commented Oct 5, 2021

pinin4fjords commented Jan 22, 2025

pinin4fjords commented Jan 22, 2025