Change the quick search to find obsolete genes on the merges and deletions page #1054

kimrutherford · 2023-01-12T05:32:37Z

We should think about loading the deleted genes with the is_obsolete flag to true. A lot of downstream code will need to change to check the flag so that obsolete genes aren't included in the query builder results and in counts/stats.

We're considering this so that we can eventually include the deleted/obsolete genes in the quick search and provide basic "obsolete gene" pages with any historical information that might be useful.

Loading these genes into Chado might not be the best solution. It might be better for the web site code to read a separate (TSV?) file with this information and only load the current genes into Chado, as we do at the moment. This needs a bit or discussion and thought.

The text was updated successfully, but these errors were encountered:

kimrutherford · 2023-01-16T01:30:32Z

A good first step would be to discuss what should be shown on the pages for the obsolete genes. Then it will be easier to decide if we should use Chado for this.

For comparison, FlyBase do store obsolete genes in Chado.

ValWood · 2023-01-16T08:55:37Z

name, location, reason for obsoletion
anything else?
@manulera

manulera · 2023-01-16T10:13:58Z

I wonder if those identifiers for many cases should not be treated just as synonyms, two examples below. What would be a case in which an obsoletion should be treated differently from a synonym?

SPNCRNA.55

This is how SPAC6B12.18 looked before SPNCRNA.55 was added as an obsolete_name:

FT   CDS             2416385..2416606
[...]
FT                   /systematic_id="SPAC6B12.18"
[...]
FT                   /synonym="prl55"
FT                   /synonym="SPNCRNA.55"

When SPNCRNA.55 was originally added (rev 20040915), it was a non-coding RNA:

FT   misc_RNA        2417135..2417606
FT                   /gene="prl55"
FT                   /psu_db_xref="EMBL:AB084867;"
FT                   /note="PMID: 12597277"
FT                   /systematic_id="SPNCRNA.55"
FT                   /primary_name="prl55"
FT                   /product="non-coding RNA"

Then SPAC6B12.18 was added on rev 20060219 as a CDS, and both co-existed until added as synonyms:

FT   misc_RNA        2416235..2416706
FT                   /gene="prl55"
FT                   /db_xref="EMBL:AB084867"
FT                   /db_xref="PMID:12597277"
FT                   /systematic_id="SPNCRNA.55"
FT                   /primary_name="prl55"
FT                   /product="non-coding RNA (predicted)"
FT                   /controlled_curation="term=non-coding RNA;
FT                   qualifier=predicted; db_xref=PMID:12597277; date=20050412"
FT                   /controlled_curation="term=poly(A)-bearing RNA;
FT                   qualifier=predicted; db_xref=PMID:12597277; date=20050412"
FT                   /controlled_curation="term=no detectable long open reading
FT                   frame; qualifier=predicted; db_xref=PMID:12597277;
FT                   date=20050412"
FT   CDS             2416385..2416606
FT                   /product="dubious"
FT                   /gene="SPAC6B12.18"
FT                   /controlled_curation="term=longest ORF in prl55 (73AA),
FT                   possibly be protein coding; date=20060721"
FT                   /colour=6
FT                   /fasta_file="fasta/chromosome1.contig.seq.00996.out"

SPAC11D3.12c

In this one we don't even have two separate features in our record. Currently it is an obsolete_name of SPAC11D3.11c. In our first revision (SPAC11D3.11c), it was used like this:

FT   CDS             complement(join(137589..138452,138452..139483,139527..139551,139596..139609))
FT                   /colour=4
FT                   /gene="SPAC11D3.11c"
FT                   /gene="SPAC11D3.12c"
FT                   /label=SPAC11D3.11c
FT                   /note="SPAC11D3.11c, len:644,
FT                   SIMILARITY:Schizosaccharomyces pombe, O74915, putative
FT                   transcriptional activator, zinc finger containing., (684
FT                   aa), fasta scores: opt: 1914, E():0, (46.9% identity in
FT                   616 aa)"
FT                   /product="putative transcriptional activator, zinc finger
FT                   containing."
FT                   /pseudo
FT                   /fasta_file="fasta/c11D3.tab.seq.00054.out"

ValWood · 2023-01-16T15:29:24Z

Historically if a CDS was replaced by an ncRNA or vice. versa, we kept the original names as synonyms. This was mainly because we would 'lose' them otherwise. If we have a method to obsolete IDs and allow people to trace their history we can review this practice. (i.e SPNCRNA.55 would not be a synonym of SPAC6B12.18, it would just become an obsoleted feature that was 'replaced by SPAC6B12.18). We can discuss this tomorrow...

Not used for anything yet Refs pombase/pombase-chado#1054

ValWood · 2023-01-17T11:14:57Z

@manulera update with todays decisions

manulera · 2023-01-17T15:41:06Z

We decided that for most cases these obsoletions are either coming from merges or from removed features that never had associated annotations.

We decided that having dedicated pages for each them is not worth it, but that they should be findable in the quick search and redirect to a page with the list of merges or deletions.

kimrutherford · 2023-02-14T08:02:05Z

I've changed the title to match our plan.

kimrutherford added the discuss label Jan 12, 2023

kimrutherford self-assigned this Jan 12, 2023

kimrutherford added a commit to pombase/pombase-chado-json that referenced this issue Jan 17, 2023

Query is_obsolete from Chado

f474203

Not used for anything yet Refs pombase/pombase-chado#1054

kimrutherford changed the title ~~Maybe load obsolete genes into Chado to allow for "obsolete gene" pages and quick searching~~ Change the quick search to find obsolete genes on the merges and deletions page Feb 14, 2023

ValWood added standardization QC medium priority labels Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the quick search to find obsolete genes on the merges and deletions page #1054

Change the quick search to find obsolete genes on the merges and deletions page #1054

kimrutherford commented Jan 12, 2023

kimrutherford commented Jan 16, 2023

ValWood commented Jan 16, 2023

manulera commented Jan 16, 2023

ValWood commented Jan 16, 2023

ValWood commented Jan 17, 2023

manulera commented Jan 17, 2023 •

edited

Loading

kimrutherford commented Feb 14, 2023

Change the quick search to find obsolete genes on the merges and deletions page #1054

Change the quick search to find obsolete genes on the merges and deletions page #1054

Comments

kimrutherford commented Jan 12, 2023

kimrutherford commented Jan 16, 2023

ValWood commented Jan 16, 2023

manulera commented Jan 16, 2023

SPNCRNA.55

SPAC11D3.12c

ValWood commented Jan 16, 2023

ValWood commented Jan 17, 2023

manulera commented Jan 17, 2023 • edited Loading

kimrutherford commented Feb 14, 2023

manulera commented Jan 17, 2023 •

edited

Loading