Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF error when using a mtDNA annotation file #18

Closed
hoelzer opened this issue Nov 12, 2024 · 2 comments · Fixed by #17
Closed

GFF error when using a mtDNA annotation file #18

hoelzer opened this issue Nov 12, 2024 · 2 comments · Fixed by #17
Labels
bug Something isn't working

Comments

@hoelzer
Copy link

hoelzer commented Nov 12, 2024

Hey, me again : )

I am trying to use a GFF annotation from MITOS2 for a mtDNA genome:

CM022781.1.gff.zip

But I get:

virheat 2024-11-07-vcf-results/vcf-filtered/ results-filtered -g CM022781.1.gff -r CM022781.1

Traceback (most recent call last):
  File "/Users/martin/miniconda3/envs/virheat/bin/virheat", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/martin/miniconda3/envs/virheat/lib/python3.12/site-packages/virheat/command.py", line 192, in main
    gff3_info = data_prep.parse_gff3(args.gff3_path, args.reference)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/martin/miniconda3/envs/virheat/lib/python3.12/site-packages/virheat/scripts/data_prep.py", line 310, in parse_gff3
    gff3_dict[gff_values[2]][attribute_id][identifier] = val.replace("\n", "")
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'transcript_trnF(gaa)'

So I guess this is a GFF file format you were not expecting ;)

Any easy way to get that visualized w/ virHEAT?

(and sorry for repurposing your tool for mtDNA... but it would be very neat also for such an application!)

@jonas-fuchs
Copy link
Owner

jonas-fuchs commented Nov 13, 2024

@hoelzer Thanks! This is enhancing the tool a bit :).

So basically what I do for the nested dictionary is to set the final keys with the ID= identifier. The gff format was not the problem, but in your case that exons only have parents . I have checked quite a few gffs and never seen that there is no unique identifier. I introduced now a simple check if ID= is present in each line.

I also checked NCBI a bit and it seems that exons can have IDs e.g.:

WJXW01000006.1 Genbank exon 32023 33421 . + . ID=exon-gnl

So I think this simple check is ok for now. Probably depends on who annotates these genomes.

Can you again have a look? It should work fine now. Same branch as the last issue :)

@jonas-fuchs jonas-fuchs added the bug Something isn't working label Nov 13, 2024
@hoelzer
Copy link
Author

hoelzer commented Nov 13, 2024

Yes thanks, works as reported in the PR!

virheat 2024-11-07-vcf-results/vcf-filtered/ results-filtered -g CM022781.1.gff -r CM022781.1

virHEAT_plot.pdf

Yeah, the tool I used is MITOS2 which is a special annotation tool for mtDNAs. And as it seems provides a bit weird GFF output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants