-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GAF loading for RNAcentral annotations #1255
Comments
This is a little trickier than I thought. The URS IDs aren't unique so some annotation rows from the GOA GAF file will map to multiple genes. Before I go ahead with implementing it, is it OK to do that? |
I think its the other way around. One pombe genes will have multiple URS IDs . If it's the other way around it sounds wrong and we should look into that! |
There are two genes in Chado with 2 URS IDs but those have 2 transcripts - each transcript has its own URS. There are quite few URS IDs that are attached to more than one gene. Some examples: https://rnacentral.org/rna/URS0000314D2B/4896 URS0000415965 │ SPATRNAALA.01 ║ URS00002BA4D5 │ SPATRNATRP.01 ║ |
OK, that's a bit weird. We should report that too. |
That's fixed for the next load. Once it's done I'll compare with the previous load to see how many extra annotations we have. I suspect it won't be many. |
168 :-) On Monday I'll double check to make sure we're getting all the possible annotations. |
I think it's OK. There only 630 pombe RNAcentral annotations (I miscounted earlier because I included the japonicus annotations). There a 160 or so URS IDs that don't have corresponding pombe genes which causes 334 of the 630 annotations to fail to load. Some of the remaining annotations are filtered because of: And then some annotations are filtered because you have a more specific annotation already. |
It looks like they group identical genes into one entry: |
These errors should be fixable. There are about 1000 RNAcentral annotations that are failing because of this:
https://curation.pombase.org/dumps/builds/pombase-build-2025-01-23/logs/log.2025-01-22-23-42-44.gaf-load-output
Probably we just need to remove the
_4896
from the IDs likeURS000030D4E9_4896
and look up in Chado using the base IDURS000030D4E9
.The text was updated successfully, but these errors were encountered: