zotu fasta and insect contain more zotus than zotu table #85

cajwalsh · 2024-08-16T02:06:25Z

I noticed recently that the zotu table was missing entries/rows for zOTUs that were classified by insect. They also appear in the fasta file. E.g. the fasta contains 10675 zOTUs but the table only has 10621. Many of these were unclassified/blank or unknown eukaryotes (according to insect), but many were identified to at least order and some to family or to species.

I also noticed that there were zOTUs that had fewer than the minimum abundance specified for vsearch (e.g. 8 is the default but I had many below 8 read abundance across the whole dataset).

This made me suspect that this is coming from after the derep and clustering step of the vsearch process in either the uchime or remap/table creation step.

I don't think this is new as I'm pretty sure I remember looking into this before but forgetting, but also as you mentioned it doesn't seem likely to be something with the pipeline so much as vsearch doing things we don't expect.

cajwalsh · 2024-08-18T19:45:06Z

related to this process as well, I have been meaning to say that I am pretty sure that "all_cpus" for the derep processes is using all cores on the computer rather than all cores given to the run to use. I assume this is not the desired functionality but if so no worries.

mhoban · 2024-08-19T04:57:07Z

related to this process as well, I have been meaning to say that I am pretty sure that "all_cpus" for the derep processes is using all cores on the computer rather than all cores given to the run to use. I assume this is not the desired functionality but if so no worries.

Oh yeah, this is something I've known about and meant to fix but keep forgetting. I'll make a separate issue for it.

ericgarciaresearch · 2024-10-08T22:49:54Z

Hey @mhoban , @cajwalsh ,
I have been running/testing your pipeline with a few datasets along with @cbird808 , and I am also seeing this behavior too. My zotu_table.tsv is missing some of the ZOTUs that are present in the zotu fasta and also the lca_taxonomy.tsv

How you found what is causing this vsearch behavior? or how can we avoid losing these ZOTUs?

Also related, are you still pursuing adding the option to use the DADA2 denoiser? (issue #34)

mhoban · 2024-10-08T23:04:35Z

@ericgarciaresearch @cajwalsh can you provide input files that reproduce this problem? This is the relabeled and merged fasta file that goes into vsearch. There should be a copy sitting in preprocess/merged in your analysis directory

mhoban · 2024-10-09T00:28:43Z

Also related, are you still pursuing adding the option to use the DADA2 denoiser? (issue #34)

It's still on the list, but I haven't made much progress. One of the issues is that DADA2 prefers that you keep your reads un-merged, so it would require some reworking to fit it into the rest of the pipeline.

ericgarciaresearch · 2024-10-11T19:09:22Z

@mhoban I sent the relabeled fasta file to your hawaii email. Please let me know if you cannot access it. Also please don't make that data public.

some of my remaining questions are: Is the LCA taking the list of zotus from the fasta file? are the missing zotus in the zotu table correctly being dropped by vsearch (chimera or other filter)? If yes, should all downstream analyses only consider the zotus in the zotu table?

Let me know what you find out!

mhoban · 2024-10-11T19:15:45Z

BLAST and insect use the zOTUs fasta file as their query source (and LCA uses the BLAST results as input), but the final zOTU table by necessity only contains the zOTUs in the dereplicated table (output/zotus/zotu_table.tsv).

I'll look into this further and let you know.

mhoban · 2024-10-11T19:30:11Z

This issue is very likely due to the fact that zotus.fasta is generated in one step and the zotu table is generated in a subsequent step using a similarity threshold (currently hard-coded to 97%). This is one of a number of aspects of the pipeline that are a holdover from its adaptation from eDNAFlow. I will make that threshold user-definable (which I should have done a long time ago). I suspect that tweaking that value will change which zotus end up in the fasta versus the abundance table.

Here's how vsearch is called (note emphasized --id parameter on the last line):


vsearch --threads ${task.cpus} --derep_fulllength ${relabeled_merged} --sizeout --output "${id}_unique.fasta"
vsearch --threads ${task.cpus} --cluster_unoise "${id}_unique.fasta" --centroids "${id}_centroids.fasta" --minsize ${params.minAbundance} --unoise_alpha ${params.alpha}
vsearch --threads ${task.cpus} --uchime3_denovo "${id}_centroids.fasta" --nonchimeras "${id}_zotus.fasta" --relabel Zotu 
vsearch --threads ${task.cpus} --usearch_global ${relabeled_merged} --db "${id}_zotus.fasta" --id 0.97 --otutabout zotu_table.tsv

mhoban · 2024-10-11T20:07:37Z

The zOTU table generation step of dereplicate currently hard-codes a "fractional identity" of 97%. In vsearch it's passed explicitly, in usearch 0.97 is the default so it isn't specified.

This is a holdover from eDNAFlow and the pipeline should be updated to allow the user to customize this value.

mhoban · 2024-10-11T20:58:18Z

I tried re-running the otu table generation step with --id 0.99 and all zOTUs from the fasta were included in the table.

mhoban · 2024-10-16T07:04:16Z

This looks to be the same issue as #73, so I'm gonna close that one.

ericgarciaresearch · 2024-10-18T16:18:10Z

Cool, I think that answers my questions, Thanks!!!

ericgarciaresearch · 2024-10-25T00:24:08Z

Hi @mhoban , sorry forgot to ask, are you planning to make that fractional identify --id a user-defined variable? I think we might be interested in increasing this threshold for some datasets at the very least

ericgarciaresearch · 2024-10-25T00:25:40Z

ahh nevermind, I see it!

--zotu-identity

ericgarciaresearch · 2024-10-25T00:34:29Z

related question:
can you report the used --alpha and --zotu-identity in /output/zotus/settings.txt?

I have been increasing alpha to 5 for COI datasets (based on https://doi.org/10.1186/s12859-021-04115-6) which has generated more zotus. I have been keeping track in my notes of which run uses which alpha but I have a script that harvests the settings and results stats from the output but I couldn't find where the used alpha is reported in the output. I only see the min seq abundance and denoiser.

mhoban · 2024-10-25T00:59:45Z

can you report the used --alpha and --zotu-identity in /output/zotus/settings.txt?

Of course, I should have done so already. But heads up that there's a (low priority) open issue to change those settings.txt files to yaml format. It may require you to update your script, but it may make things easier in terms of machine-readability.

mhoban · 2024-10-25T02:34:57Z

@ericgarciaresearch this is now done as of 4046795

mhoban mentioned this issue Aug 19, 2024

Make sure processes actually just use the cpus they're allocated #87

Closed

mhoban mentioned this issue Oct 16, 2024

derep_vsearch zOTUs appearing in fasta but not zotu_table #73

Closed

mhoban pushed a commit that referenced this issue Oct 18, 2024

added --zotu-identity parameter to address #85

8bebe25

mhoban mentioned this issue Oct 18, 2024

Add --zotu-identity parameter for zOTU table generation step #112

Merged

mhoban linked a pull request Oct 18, 2024 that will close this issue

Add --zotu-identity parameter for zOTU table generation step #112

Merged

mhoban closed this as completed in #112 Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zotu fasta and insect contain more zotus than zotu table #85

zotu fasta and insect contain more zotus than zotu table #85

cajwalsh commented Aug 16, 2024

cajwalsh commented Aug 18, 2024

mhoban commented Aug 19, 2024

ericgarciaresearch commented Oct 8, 2024

mhoban commented Oct 8, 2024

mhoban commented Oct 9, 2024

ericgarciaresearch commented Oct 11, 2024

mhoban commented Oct 11, 2024 •

edited

Loading

mhoban commented Oct 11, 2024

mhoban commented Oct 11, 2024

mhoban commented Oct 11, 2024

mhoban commented Oct 16, 2024

ericgarciaresearch commented Oct 18, 2024

ericgarciaresearch commented Oct 25, 2024

ericgarciaresearch commented Oct 25, 2024

ericgarciaresearch commented Oct 25, 2024

mhoban commented Oct 25, 2024

mhoban commented Oct 25, 2024

zotu fasta and insect contain more zotus than zotu table #85

zotu fasta and insect contain more zotus than zotu table #85

Comments

cajwalsh commented Aug 16, 2024

cajwalsh commented Aug 18, 2024

mhoban commented Aug 19, 2024

ericgarciaresearch commented Oct 8, 2024

mhoban commented Oct 8, 2024

mhoban commented Oct 9, 2024

ericgarciaresearch commented Oct 11, 2024

mhoban commented Oct 11, 2024 • edited Loading

mhoban commented Oct 11, 2024

mhoban commented Oct 11, 2024

mhoban commented Oct 11, 2024

mhoban commented Oct 16, 2024

ericgarciaresearch commented Oct 18, 2024

ericgarciaresearch commented Oct 25, 2024

ericgarciaresearch commented Oct 25, 2024

ericgarciaresearch commented Oct 25, 2024

mhoban commented Oct 25, 2024

mhoban commented Oct 25, 2024

mhoban commented Oct 11, 2024 •

edited

Loading