-
Notifications
You must be signed in to change notification settings - Fork 78
Consensus genome and variant calling from Illumina only #141
Comments
Can you make the consensus part of you workflow a separate workflow ? Might be interesting to compare this to the samples that also appear in GISAID. I think we can add this to our current workflow (which is a subworkflow with a parallelized variant of the download step, and then runs the SE and PE workflows -- https://usegalaxy.org/u/sars-cov2-bot/w/download-and-sepe-illumina-covid-variation-workflow-imported-from-uploaded-file). Can you describe the artic workflow in a bit more detail ? Why minimap, what's the bed file used for, why ivar, etc ? A bit like we do in https://covid19.galaxyproject.org/genomics/4-Variation/#how-do-we-call-variants ? |
Hi @mvdbeek the only steps that are not involved in generating the consensus are the final "SnpSift Extract Fields" and "Collapse Collection" steps. Everything else goes into computing the consensus. So in the workflow you linked it would.
So it plugs quite naturally into what you are using. I imagine that you want to keep evolving the workflow to incorporate new insights into filtering (e.g. some of what is being discussed here). |
Commenting separately on the The BED file is the location of ARTIC primers (there are 2 sets in common use - v.1 and v.3). This feeds into I can write up a section for the web page but thought I should start the discussion here to iron out issues like the ones you've raised first. |
@pvanheus just proposed an update (#185) of the ARTIC WF in its dedicated section. |
Hi @wm75, I took the output of zip collection to pass to the rest of the WF instead of the individual read datasets from fastp and also put a flatten collection between Qualimap BAMQC and MultiQC so multiqc only runs once. My version of the WF is at:https://usegalaxy.org.au/u/simongladman/w/covid-19-variation-analysis-on-artic-pe-data |
@Slugger70 I think by passing a list:paired collection to bwa-mem, you may run into the same blocking WF scheduling issue that you're trying to avoid at the fastp step. At least that's my recollection. |
Thanks for spotting that MultiQC issue! I'll look into it for EU. |
I've updated the EU WF with the Flatten Collection step of the Qualimap output just as you were doing it for AU. Thanks again @Slugger70! |
Hi everyone
I have been working with the National Institute for Communicable Diseases (NICD) on some of their SARS-CoV-2 sequencing. I adapted the "variation" workflow to call variants in Illumina data and also produce an inferred consensus genome. The current "Assembly" workflow assumes that you have access to both Illumina and Nanopore genomes for a sample, which is a pretty rare situation. My workflow (with some TODOs in the step annotation) is in a gist. Comments and additions are welcome!
If it is found to be useful perhaps it can be incorporated into the COVID-19 resources page.
P.S. for those with ARTIC Amplicon data I created a workflow for analysing that as did Thanh le Viet. Pasting them here in case they are useful.
The text was updated successfully, but these errors were encountered: