-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapting the script for mtDNA #16
Comments
Yeah... probalby it's best we filter/format our VCF files to work with your plotting tool. Here is a plot when I filter our VCFs for all entries that have AF=1 and AF=0.5 removing all other cases |
@hoelzer What is the relevance of these calls: |
I could also check if all are zero and if not use the highest/lowest AF. |
Unfortunately there are diff combinations of such AF strings with commata 🙄 I think the meaning is when you have mixed variant calls, e.g. a G in the ref but then an A or an T in the reads with varying frequency etc... My impression was that they are not so meaningful for our analysis.. but not sure in general Also, not sure if this is a freebayes specific way of writing such calls in the VCF file. |
Ah I see. But this I could implement this in the vcf parsing... Let me check how straight forward that would be.
Hmm, actually I think you will then miss the ones that have several variants called at a single position, or not?
Also not sure, but at least not typical for most of the variant callers I have used. |
Yeah exactly, we would miss those. Probably you would have to split such an AF entry with multiple alternative alleles into multiple rows in the VCF file. There is also another thing but this is related to freebayes and the assumed ploidy: we only have 0/0.5/1 values for AF. However, one could calculate more precise allele frequencies from the AD field which holds the raw read counts for each allele. I asked Ayaka to adapt a little script I wrote that does a simple filtering for the AF fields with only 1 or 0.5 and no commas to also replace the AF values with more precise AD-based ones. I am mentioning that because when you have smt like AF=0,0,0.5 you will also have a more complex AD field 🙄 |
@hoelzer Can you give it a try (see PR). Basically, I now attempt a second comma-separated split for the |
works as described in #17 |
Hey @jonas-fuchs we like to adapt the script for plotting variants in mtDNAs. We have the VCF files (from
freebayes
) and are now trying to produce a heatmap. Ayaka is supporting this effort.We tried
but then get
I guess there is a problem with the format of the VCF and how virHEAT is parsing it?
Here is an example VCF:
IHIT11.mtDNA.vcf.zip
Are you using the
AF
flag for getting the allele frequency? I think this might be our problem bc we have strings likeAF=0,0,0;
for weird combinations of variant calls.The text was updated successfully, but these errors were encountered: