Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add taxid of last non-"dropped" taxon to collapsed taxonomy #83

Open
mhoban opened this issue Aug 15, 2024 · 3 comments
Open

Add taxid of last non-"dropped" taxon to collapsed taxonomy #83

mhoban opened this issue Aug 15, 2024 · 3 comments

Comments

@mhoban
Copy link
Owner

mhoban commented Aug 15, 2024

It would be useful to have the NCBI taxid of the last non-"dropped" taxon added to the collapsed taxonomy table. There was some attempt to do this already, but it turns out to be tricky. We can't just join by name, since names may be repeated across (super)kingdoms and/or for things like subgenera and we don't necessarily have the higher levels to use as join criteria (unless we want to go row-by-row, which would be wicked slow).

@mhoban
Copy link
Owner Author

mhoban commented Aug 15, 2024

I suppose we could filter out only levels within the dumb kids playing catch, etc. hierarchy (excluding things like subgenera) and just use domain and/or kingdom as higher-level join criteria in addition to the lowest non-dropped name itself.

@mhoban
Copy link
Owner Author

mhoban commented Aug 15, 2024

See #80 for the offending code section

@mhoban
Copy link
Owner Author

mhoban commented Aug 15, 2024

This can be done using the nodes.dmp file from the NCBI taxonomy dump. We'll Join in nodes.dmp and filter it by only kingdom:species ranks.

This requires the get_lineage process to also unzip nodes.dmp, which inspires me to reuse the get_model code for downloading arbitrary stuff from a URL and then a get_zip process to extract specific files from a a zip archive in parallel.
(or something like this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant