Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in PomBase GPI causes issues with NEO pipeline (duplicate entries) #4820

Closed
kltm opened this issue Nov 8, 2023 · 12 comments
Closed

Issue in PomBase GPI causes issues with NEO pipeline (duplicate entries) #4820

kltm opened this issue Nov 8, 2023 · 12 comments

Comments

@kltm
Copy link
Member

kltm commented Nov 8, 2023

In the file https://www.pombase.org/data/annotations/Gene_ontology/pombase.gpi.gz, lines like

PR:000060241 tht1 nuclear membrane protein involved in karyogamy Tht1 SO:0001217 NCBITaxon:4896 PomBase:SPAC13C5.03 PomBase:SPAC13C5.03.1.pep UniProtKB:Q09684 go-annotation-summary=nuclear membrane protein involved in karyogamy Tht1
PR:000060241 akr1 palmitoyltransferase Akr1 SO:0001217 NCBITaxon:4896 PomBase:SPAC2F7.10 PomBase:SPAC2F7.10.1.pep UniProtKB:Q09701 go-annotation-summary=palmitoyltransferase Akr1

generate obo like

[Term]
id: PR:000060241
name: tht1 Spom
synonym: "tht1" BROAD []
synonym: "000060241" RELATED []
synonym: "SPAC13C5.03.1.pep" RELATED []
xref: PomBase:SPAC13C5.03.1.pep
is_a: CHEBI:33695 ! information biomacromolecule
relationship: in_taxon NCBITaxon:4896
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProduct
relationship: has_gene_template PomBase:SPAC13C5.03

[...]

[Term]
id: PR:000060241
name: akr1 Spom
synonym: "akr1" BROAD []
synonym: "000060241" RELATED []
synonym: "SPAC2F7.10.1.pep" RELATED []
xref: PomBase:SPAC2F7.10.1.pep
is_a: CHEBI:33695 ! information biomacromolecule
relationship: in_taxon NCBITaxon:4896
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProduct
relationship: has_gene_template PomBase:SPAC2F7.10

Naturally, having duplicate entries causes parsing and resolution issues.

The following identifiers have multiple entries in the GPI:

      2 PR:000060241
      3 PR:000060258

Tagging @ValWood @kimrutherford @pgaudet @vanaukenk

@kltm kltm changed the title Issue in PomBase GPI causes issues with NEO pipeline (name tags) Issue in PomBase GPI causes issues with NEO pipeline (duplicate entries) Nov 8, 2023
@kltm
Copy link
Member Author

kltm commented Nov 8, 2023

Noting that this will block the NEO pipeline until we have a fixed upstream version or clean a version and point to that in the metadata instead.

@ValWood
Copy link
Contributor

ValWood commented Nov 8, 2023

Oh shit, I have this in my inbox. I didn't think it was critical because I had assumed it was referring to different PRO iDs for the same gene, but this is basically an annotation error.

Screenshot 2023-11-08 at 22 05 01

@ValWood
Copy link
Contributor

ValWood commented Nov 8, 2023

I will fix it tomorrow, I have run out of energy today. But it is impressive that both pipelines pick it up ;)

@kltm
Copy link
Member Author

kltm commented Nov 8, 2023

No worries! I just wanted to make sure this didn't accidentally fly under the radar.

@ValWood
Copy link
Contributor

ValWood commented Nov 13, 2023

Hi @nataled
I may need your help here.

https://proconsortium.org/app/entry/PR_000060241/
refers to
Palmitoylase=("ark1"; PR:000060242;
but this looks incorrect. Should it be
PR:000060240
PR:000060240

I'm not sure that this fixes the issue (because there might be some confusion between akr1 (palmitoyltransferase) and ark1 (protein kinase) too, either at my end, or your end, I will dig deeper)

v

@ValWood
Copy link
Contributor

ValWood commented Nov 13, 2023

The annotation issue was that we had applied the modification to akr1
Screenshot 2023-11-13 at 15 53 14

I deleted this. We only use the modified forms with the entities, not their substrates.

@nataled
Copy link

nataled commented Nov 13, 2023

@ValWood is there anything still needed on my end, or does your action fix the issue?

@ValWood
Copy link
Contributor

ValWood commented Nov 13, 2023

PRO is all correct except Palmitoylase=("ark1"; PR:000060242;
which I now realise refers to the palmitoylase modifying tht1 should be akr1 here in the comment:

Comment | Requested by=PomBase. Palmitoylase=("ark1"; PR:000060242; Cys-65/Cys-78). Evidence=(ECO:0001091, based on PMID:36650056).

let me know if you want me to transfer this to the original ticket.

@ValWood ValWood closed this as completed Nov 13, 2023
@kimrutherford
Copy link

Hi All. The PomBase GPI file should be updated with Val's fix by 4am GMT tonight.

@nataled
Copy link

nataled commented Nov 13, 2023

@ValWood already fixed in local copy of PRO, but for completeness I'll add a note to the original ticket submitted to PRO.

@kltm
Copy link
Member Author

kltm commented Nov 13, 2023

Hi All. The PomBase GPI file should be updated with Val's fix by 4am GMT tonight.

Cheers! This should get "automatically" tested this Wednesday.

@kltm
Copy link
Member Author

kltm commented Nov 15, 2023

The test cleared and we are building again. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants