Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeneticVariation.VCF.Reader does not like output from the gatk tool: funcotator #34

Open
jackmo375 opened this issue Oct 10, 2022 · 1 comment

Comments

@jackmo375
Copy link

I have had trouble when reading files with GeneticVariation.VCF.Reader that were output form the gatk tool: funcotator. When I called the function on the vcf file it threw an error. I have tried to describe this error and give a possible cause of the error below. It is a minor issue, and I resolved it by deleting a single line in my input vcf file before calling GeneticVariation.VCF.Reader which solved the problem but i wanted to draw the developers to the issue as it would be better to run it without my hack of manually deleting a line in the input vcf file :)

Expected Behavior

I took a vcf file, v4.2, that was output from funcotator and passed it to the function GeneticVariation.VCF.Reader.

Current Behavior

here is the current behavior from a live Julia session:

julia> inputvcfIO = open("./NA18486.annotated.vcf", "r")
IOStream(<file ./NA18486.annotated.vcf>)

julia> GeneticVariation.VCF.Reader(inputvcfIO)
ERROR: GeneticVariation.VCF.Reader file format error on line 17
Stacktrace:
 [1] error(::String, ::Int64)
   @ Base ./error.jl:44
 [2] _readheader!(reader::GeneticVariation.VCF.Reader, state::BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}})
   @ GeneticVariation.VCF ~/.julia/packages/BioCore/YBJvb/src/ReaderHelper.jl:106
 [3] readheader!(reader::GeneticVariation.VCF.Reader)
   @ GeneticVariation.VCF ~/.julia/packages/BioCore/YBJvb/src/ReaderHelper.jl:80
 [4] Reader
   @ ~/.julia/packages/GeneticVariation/r8DAL/src/vcf/reader.jl:15 [inlined]
 [5] GeneticVariation.VCF.Reader(input::IOStream)
   @ GeneticVariation.VCF ~/.julia/packages/GeneticVariation/r8DAL/src/vcf/reader.jl:28
 [6] top-level scope
   @ REPL[6]:1

Possible Solution / Implementation

So I went to line 17 of the vcf file as the error log above advises, and found the following line:

##Funcotator Version=4.2.6.1 | Gencode 34 CANONICAL | ACMGLMMLof 1 | ACMG_recommendation SF_v2.0 | ClinVar_VCF 20180429_hg38 | LMMKnown 20180618

Once I deleted this line, I re-ran the code:

julia> inputvcfIO = open("./NA18486.annotated.edited.vcf", "r")
IOStream(<file ./NA18486.annotated.edited.vcf>)

julia> GeneticVariation.VCF.Reader(inputvcfIO)
GeneticVariation.VCF.Reader(BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}(BufferedStreams.BufferedInputStream{IOStream}(<256.0 KiB buffer, 28% filled, data immobilized>), 1, 3414, false), GeneticVariation.VCF.Header:
  metainfo tags: fileformat ALT FILTER FORMAT GATKCommandLine INFO contig source
     sample IDs: NA18486)

And as you can see the problem is resolved.

Here is my issue: I cannot see what is wrong with that line ##Funcotator Version=4.2.6... Why is it not accepted as valid input?

Note, it is possible that other gatk tools insert lines like this in their vcf output files so the problem may occur in other places. (I acknowledge this may actually be an issue to raise with the gatk developers instead?)

This bug certainly affects the usability of the code. I was initially using the VIVA command line program for analyzing vcf files and received this error. so I had to go looking at the Julia source code of dependencies of the VIVA program to resolve the issue.

@janxkoci
Copy link

janxkoci commented Jan 12, 2025

To me, that line looks invalid. GATK was known to produce VCFs that break standards, so someone should check if this issue was not caused by GATK instead (by looking at the standard and what's allowed). Since it's a line containing just package version, it should be easy to fix its format to be standard-compliant.

PS: looking at some of my VCF files - unquoted spaces are allowed, so probably the pipe character (|) is the problem. I can try making some fake VCF for testing later. Also - do bcftools handle such file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants