You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be useful to have the NCBI taxid of the last non-"dropped" taxon added to the collapsed taxonomy table. There was some attempt to do this already, but it turns out to be tricky. We can't just join by name, since names may be repeated across (super)kingdoms and/or for things like subgenera and we don't necessarily have the higher levels to use as join criteria (unless we want to go row-by-row, which would be wicked slow).
The text was updated successfully, but these errors were encountered:
I suppose we could filter out only levels within the dumb kids playing catch, etc. hierarchy (excluding things like subgenera) and just use domain and/or kingdom as higher-level join criteria in addition to the lowest non-dropped name itself.
This can be done using the nodes.dmp file from the NCBI taxonomy dump. We'll Join in nodes.dmp and filter it by only kingdom:species ranks.
This requires the get_lineage process to also unzip nodes.dmp, which inspires me to reuse the get_model code for downloading arbitrary stuff from a URL and then a get_zip process to extract specific files from a a zip archive in parallel.
(or something like this)
It would be useful to have the NCBI taxid of the last non-"dropped" taxon added to the collapsed taxonomy table. There was some attempt to do this already, but it turns out to be tricky. We can't just join by name, since names may be repeated across (super)kingdoms and/or for things like subgenera and we don't necessarily have the higher levels to use as join criteria (unless we want to go row-by-row, which would be wicked slow).
The text was updated successfully, but these errors were encountered: