-
Notifications
You must be signed in to change notification settings - Fork 78
Missing Indels #49
Comments
what is your protocol? |
For these mappings, it's simply Given the small size of the genome, I just eye-balled possible variants and saw this. Then realised you don't call indels in your variant analysis. This is the content I've developed for a Master/Major in Bioinformatics Genomics course: |
Perhaps you are already calling variants. However, they are missing from these outputs: https://github.com/galaxyproject/SARS-CoV-2/tree/master/4-Variation#outputs |
The current workflow fails to call indels because it does not use lofreq indelqual to add indel qualities to the mapped reads first. Without those lofreq call skips indels even when the @nekrut are you interested in having indels get called and do want to update the workflow accordingly yourself? |
Hi! I was just stopping by to say this, and you already did! Indels should be called, as we have observed a bunch of them in covid19 data. Moreover they are indels that conserve the open reading frame, only deleting with aa, which makes it a most probably functional variant. Also I think you don't filter host reads in the variant pipeline, but I'm not sure if you do it in the assembly. We have observed that bwa mem is two much sensitive and softclips a lot of reads against the human genome, bowtie2 is much better option for this type of data, specially if you are using amplicon data. Thanks for the repo! |
@saramonzon do you have evidence for host reads (human) aligning to the SARS genome? You can reduce soft clipping by changing the end-bonus settings, or as you say, using glocal alignment via |
Hi @tseemann, yes using bwa with default parameters we obtain a considerably percentage of reads mapping to both human and SARS-Cov-2 depending on the sample, we have fixed it as you say using bowtie2, but using |
Updates lofreq version and adds indelquals. xref galaxyproject#49
My own analysis of Illumina (
SRR11140750
, bottom track) and nanopore (SRR11140751
, top track) data from the same swab sample shows your variant analysis doesn't include indels:You probably should include
--call-indels
in your call tolofreq call
The text was updated successfully, but these errors were encountered: