Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CZID-9457] - update PHAGE_FAMILIES_NAMES #352

Merged
merged 2 commits into from
May 6, 2024

Conversation

phoenixAja
Copy link
Contributor

@phoenixAja phoenixAja commented May 3, 2024

We're missing a fair amount of phage families which resulted in entries in taxon_lineages not being properly categorized as phage (see this ticket).

Karyna and I created a more comprehensive list from the supplementary materials in this paper (2022 taxonomy update of the ICTV bacterial viruses subcommittee) along with the entries in the wikipedia article for bacteriophage

To fix this in the most recent taxon lineages table i'm planning on running the following update queries in mysql and ES so we don't have to rebuild taxon_lineage tables to capture these updates:

UPDATE taxon_lineages
SET is_phage = 1
WHERE family_name IN (
    'Ackermannviridae',
    'Aggregaviridae',
    'Ahpuchviridae',
    'Aliceevansviridae',
    'Ampullaviridae',
    'Anaerodiviridae',
    'Andrewesvirinae',
    'Aoguangviridae',
    'Arenbergviridae',
    'Armatusviridae',
    'Arquatrovirinae',
    'Assiduviridae',
    'Atkinsviridae',
    'Autographiviridae',
    'Autolykiviridae',
    'Azeredovirinae',
    'Bclasvirinae',
    'Beephvirinae',
    'Bicaudaviridae',
    'Blumeviridae',
    'Boydwoodruffvirinae',
    'Bronfenbrennervirinae',
    'Casjensviridae',
    'Ceeclamvirinae',
    'Chaseviridae',
    'Chebruvirinae',
    'Chimalliviridae',
    'Clavaviridae',
    'Clermontviridae',
    'Corticoviridae',
    'Crevaviridae',
    'Cystoviridae',
    'Dclasvirinae',
    'Deejayvirinae',
    'Demerecviridae',
    'Dolichocephalovirinae',
    'Drexlerviridae',
    'Druskaviridae',
    'Duinviridae',
    'Duneviridae',
    'Eekayvirinae',
    'Ekchuahviridae',
    'Eucampyvirinae',
    'Fervensviridae',
    'Fiersviridae',
    'Finnlakeviridae',
    'Forsetiviridae',
    'Fredfastierviridae',
    'Fuselloviridae',
    'Gclasvirinae',
    'Globuloviridae',
    'Gochnauervirinae',
    'Gorgonvirinae',
    'Graaviviridae',
    'Gracegardnervirinae',
    'Grimontviridae',
    'Guelinviridae',
    'Guenliviridae',
    'Guernseyvirinae',
    'Gutmannvirinae',
    'Guttaviridae',
    'Hafunaviridae',
    'Haloferuviridae',
    'Halomagnusviridae',
    'Halspiviridae',
    'Helgolandviridae',
    'Hendrixvirinae',
    'Herelleviridae',
    'Inoviridae',
    'Intestiviridae',
    'Kairosviridae',
    'Kantovirinae',
    'Kleczkowskaviridae',
    'Konodaiviridae',
    'Kyanoviridae',
    'Langleyhallvirinae',
    'Leisingerviridae',
    'Leviviridae',
    'Lipothrixviridae',
    'Lutetiaviridae',
    'Madisaviridae',
    'Madridviridae',
    'Matshushitaviridae',
    'Matsushitaviridae',
    'Mccleskeyvirinae',
    'Mesyanzhinovviridae',
    'Microviridae',
    'Molycolviridae',
    'Myoviridae',
    'Naomviridae',
    'Nclasvirinae',
    'Nymbaxtervirinae',
    'Orlajensenviridae',
    'Ounavirinae',
    'Pachyviridae',
    'Paulinoviridae',
    'Pclasvirinae',
    'Peduoviridae',
    'Pervagoviridae',
    'Pigerviridae',
    'Plasmaviridae',
    'Plectroviridae',
    'Pleolipoviridae',
    'Podoviridae',
    'Pootjesviridae',
    'Portogloboviridae',
    'Pungoviridae',
    'Pyrstoviridae',
    'Queuovirinae',
    'Rountreeviridae',
    'Rudiviridae',
    'Ruthgordonvirinae',
    'Saffermanviridae',
    'Salasmaviridae',
    'Saparoviridae',
    'Schitoviridae',
    'Sepvirinae',
    'Shortaselviridae',
    'Simuloviridae',
    'Siphoviridae',
    'Skryabinvirinae',
    'Soleiviridae',
    'Solspiviridae',
    'Speroviridae',
    'Sphaerolipoviridae',
    'Spiraviridae',
    'Stanwilliamsviridae',
    'Steigviridae',
    'Steitzviridae',
    'Stephanstirmvirinae',
    'Straboviridae',
    'Suolaviridae',
    'Suoliviridae',
    'Tectiviridae',
    'Thaspiviridae',
    'Toyamaviridae',
    'Trabyvirinae',
    'Tristromaviridae',
    'Turriviridae',
    'Tybeckvirinae',
    'Umezonoviridae',
    'Ungulaviridae',
    'Vequintavirinae',
    'Verdandiviridae',
    'Vertoviridae',
    'Vilmaviridae',
    'Weiservirinae',
    'Winoviridae',
    'Yangangviridae',
    'Yanlukaviridae',
    'Zierdtviridae',
    'Zobellviridae',
);
POST /taxon_lineages_alias/_update_by_query
{
  "script": {
    "source": "ctx._source.is_phage = true",
    "lang": "painless"
  },
  "query": {
    "terms": {
      "family_name.keyword": [
    'Ackermannviridae',
    'Aggregaviridae',
    'Ahpuchviridae',
    'Aliceevansviridae',
    'Ampullaviridae',
    'Anaerodiviridae',
    'Andrewesvirinae',
    'Aoguangviridae',
    'Arenbergviridae',
    'Armatusviridae',
    'Arquatrovirinae',
    'Assiduviridae',
    'Atkinsviridae',
    'Autographiviridae',
    'Autolykiviridae',
    'Azeredovirinae',
    'Bclasvirinae',
    'Beephvirinae',
    'Bicaudaviridae',
    'Blumeviridae',
    'Boydwoodruffvirinae',
    'Bronfenbrennervirinae',
    'Casjensviridae',
    'Ceeclamvirinae',
    'Chaseviridae',
    'Chebruvirinae',
    'Chimalliviridae',
    'Clavaviridae',
    'Clermontviridae',
    'Corticoviridae',
    'Crevaviridae',
    'Cystoviridae',
    'Dclasvirinae',
    'Deejayvirinae',
    'Demerecviridae',
    'Dolichocephalovirinae',
    'Drexlerviridae',
    'Druskaviridae',
    'Duinviridae',
    'Duneviridae',
    'Eekayvirinae',
    'Ekchuahviridae',
    'Eucampyvirinae',
    'Fervensviridae',
    'Fiersviridae',
    'Finnlakeviridae',
    'Forsetiviridae',
    'Fredfastierviridae',
    'Fuselloviridae',
    'Gclasvirinae',
    'Globuloviridae',
    'Gochnauervirinae',
    'Gorgonvirinae',
    'Graaviviridae',
    'Gracegardnervirinae',
    'Grimontviridae',
    'Guelinviridae',
    'Guenliviridae',
    'Guernseyvirinae',
    'Gutmannvirinae',
    'Guttaviridae',
    'Hafunaviridae',
    'Haloferuviridae',
    'Halomagnusviridae',
    'Halspiviridae',
    'Helgolandviridae',
    'Hendrixvirinae',
    'Herelleviridae',
    'Inoviridae',
    'Intestiviridae',
    'Kairosviridae',
    'Kantovirinae',
    'Kleczkowskaviridae',
    'Konodaiviridae',
    'Kyanoviridae',
    'Langleyhallvirinae',
    'Leisingerviridae',
    'Leviviridae',
    'Lipothrixviridae',
    'Lutetiaviridae',
    'Madisaviridae',
    'Madridviridae',
    'Matshushitaviridae',
    'Matsushitaviridae',
    'Mccleskeyvirinae',
    'Mesyanzhinovviridae',
    'Microviridae',
    'Molycolviridae',
    'Myoviridae',
    'Naomviridae',
    'Nclasvirinae',
    'Nymbaxtervirinae',
    'Orlajensenviridae',
    'Ounavirinae',
    'Pachyviridae',
    'Paulinoviridae',
    'Pclasvirinae',
    'Peduoviridae',
    'Pervagoviridae',
    'Pigerviridae',
    'Plasmaviridae',
    'Plectroviridae',
    'Pleolipoviridae',
    'Podoviridae',
    'Pootjesviridae',
    'Portogloboviridae',
    'Pungoviridae',
    'Pyrstoviridae',
    'Queuovirinae',
    'Rountreeviridae',
    'Rudiviridae',
    'Ruthgordonvirinae',
    'Saffermanviridae',
    'Salasmaviridae',
    'Saparoviridae',
    'Schitoviridae',
    'Sepvirinae',
    'Shortaselviridae',
    'Simuloviridae',
    'Siphoviridae',
    'Skryabinvirinae',
    'Soleiviridae',
    'Solspiviridae',
    'Speroviridae',
    'Sphaerolipoviridae',
    'Spiraviridae',
    'Stanwilliamsviridae',
    'Steigviridae',
    'Steitzviridae',
    'Stephanstirmvirinae',
    'Straboviridae',
    'Suolaviridae',
    'Suoliviridae',
    'Tectiviridae',
    'Thaspiviridae',
    'Toyamaviridae',
    'Trabyvirinae',
    'Tristromaviridae',
    'Turriviridae',
    'Tybeckvirinae',
    'Umezonoviridae',
    'Ungulaviridae',
    'Vequintavirinae',
    'Verdandiviridae',
    'Vertoviridae',
    'Vilmaviridae',
    'Weiservirinae',
    'Winoviridae',
    'Yangangviridae',
    'Yanlukaviridae',
    'Zierdtviridae',
    'Zobellviridae',
      ]
    }
  }
}


@phoenixAja phoenixAja requested a review from a team May 6, 2024 16:25
@phoenixAja phoenixAja changed the title update PHAGE_FAMILIES_NAMES [CZID-9457] - update PHAGE_FAMILIES_NAMES May 6, 2024
@ninabernick ninabernick merged commit 4ccee1a into main May 6, 2024
14 checks passed
@ninabernick ninabernick deleted the phoenix/update-phage-families branch May 6, 2024 16:33
@ninabernick
Copy link
Contributor

It stinks that the flow for updating phage categorization and pathogen list has to involve so much manual curation, but I don't really see any alternatives. Something to ponder for the partner improvements, although not sure we can make progress on it in a short timeframe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants