Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the quick search to find obsolete genes on the merges and deletions page #1054

Open
kimrutherford opened this issue Jan 12, 2023 · 7 comments

Comments

@kimrutherford
Copy link
Member

We should think about loading the deleted genes with the is_obsolete flag to true. A lot of downstream code will need to change to check the flag so that obsolete genes aren't included in the query builder results and in counts/stats.

We're considering this so that we can eventually include the deleted/obsolete genes in the quick search and provide basic "obsolete gene" pages with any historical information that might be useful.

Loading these genes into Chado might not be the best solution. It might be better for the web site code to read a separate (TSV?) file with this information and only load the current genes into Chado, as we do at the moment. This needs a bit or discussion and thought.

@kimrutherford
Copy link
Member Author

A good first step would be to discuss what should be shown on the pages for the obsolete genes. Then it will be easier to decide if we should use Chado for this.

For comparison, FlyBase do store obsolete genes in Chado.

@ValWood
Copy link
Member

ValWood commented Jan 16, 2023

name, location, reason for obsoletion
anything else?
@manulera

@manulera
Copy link

I wonder if those identifiers for many cases should not be treated just as synonyms, two examples below. What would be a case in which an obsoletion should be treated differently from a synonym?

SPNCRNA.55

This is how SPAC6B12.18 looked before SPNCRNA.55 was added as an obsolete_name:

FT   CDS             2416385..2416606
[...]
FT                   /systematic_id="SPAC6B12.18"
[...]
FT                   /synonym="prl55"
FT                   /synonym="SPNCRNA.55"

When SPNCRNA.55 was originally added (rev 20040915), it was a non-coding RNA:

FT   misc_RNA        2417135..2417606
FT                   /gene="prl55"
FT                   /psu_db_xref="EMBL:AB084867;"
FT                   /note="PMID: 12597277"
FT                   /systematic_id="SPNCRNA.55"
FT                   /primary_name="prl55"
FT                   /product="non-coding RNA"

Then SPAC6B12.18 was added on rev 20060219 as a CDS, and both co-existed until added as synonyms:

FT   misc_RNA        2416235..2416706
FT                   /gene="prl55"
FT                   /db_xref="EMBL:AB084867"
FT                   /db_xref="PMID:12597277"
FT                   /systematic_id="SPNCRNA.55"
FT                   /primary_name="prl55"
FT                   /product="non-coding RNA (predicted)"
FT                   /controlled_curation="term=non-coding RNA;
FT                   qualifier=predicted; db_xref=PMID:12597277; date=20050412"
FT                   /controlled_curation="term=poly(A)-bearing RNA;
FT                   qualifier=predicted; db_xref=PMID:12597277; date=20050412"
FT                   /controlled_curation="term=no detectable long open reading
FT                   frame; qualifier=predicted; db_xref=PMID:12597277;
FT                   date=20050412"
FT   CDS             2416385..2416606
FT                   /product="dubious"
FT                   /gene="SPAC6B12.18"
FT                   /controlled_curation="term=longest ORF in prl55 (73AA),
FT                   possibly be protein coding; date=20060721"
FT                   /colour=6
FT                   /fasta_file="fasta/chromosome1.contig.seq.00996.out"

SPAC11D3.12c

In this one we don't even have two separate features in our record. Currently it is an obsolete_name of SPAC11D3.11c. In our first revision (SPAC11D3.11c), it was used like this:

FT   CDS             complement(join(137589..138452,138452..139483,139527..139551,139596..139609))
FT                   /colour=4
FT                   /gene="SPAC11D3.11c"
FT                   /gene="SPAC11D3.12c"
FT                   /label=SPAC11D3.11c
FT                   /note="SPAC11D3.11c, len:644,
FT                   SIMILARITY:Schizosaccharomyces pombe, O74915, putative
FT                   transcriptional activator, zinc finger containing., (684
FT                   aa), fasta scores: opt: 1914, E():0, (46.9% identity in
FT                   616 aa)"
FT                   /product="putative transcriptional activator, zinc finger
FT                   containing."
FT                   /pseudo
FT                   /fasta_file="fasta/c11D3.tab.seq.00054.out"

@ValWood
Copy link
Member

ValWood commented Jan 16, 2023

Historically if a CDS was replaced by an ncRNA or vice. versa, we kept the original names as synonyms. This was mainly because we would 'lose' them otherwise. If we have a method to obsolete IDs and allow people to trace their history we can review this practice. (i.e SPNCRNA.55 would not be a synonym of SPAC6B12.18, it would just become an obsoleted feature that was 'replaced by SPAC6B12.18). We can discuss this tomorrow...

kimrutherford added a commit to pombase/pombase-chado-json that referenced this issue Jan 17, 2023
@ValWood
Copy link
Member

ValWood commented Jan 17, 2023

@manulera update with todays decisions

@manulera
Copy link

manulera commented Jan 17, 2023

We decided that for most cases these obsoletions are either coming from merges or from removed features that never had associated annotations.

We decided that having dedicated pages for each them is not worth it, but that they should be findable in the quick search and redirect to a page with the list of merges or deletions.

@kimrutherford kimrutherford changed the title Maybe load obsolete genes into Chado to allow for "obsolete gene" pages and quick searching Change the quick search to find obsolete genes on the merges and deletions page Feb 14, 2023
@kimrutherford
Copy link
Member Author

I've changed the title to match our plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants