Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tax_glom add sample names #75

Open
g-antonello opened this issue Jun 16, 2022 · 2 comments
Open

tax_glom add sample names #75

g-antonello opened this issue Jun 16, 2022 · 2 comments

Comments

@g-antonello
Copy link

g-antonello commented Jun 16, 2022

Dear Mike,
Thank you for this cool add-on, makes things so much faster. I have two suggestions:
(1) In the tax_glom function, you might add an option, that I suggest as default, which aggregates and renames taxa at that aggregation level, instead of leaving the unique OTU/ASV unique identifier. otherwise, I always have to do something like:

phy_genus <- tax_glom(physeq, "Genus")
taxa_names(phy_genus) <- taxa_names(phy_genus) <- tax_table(phy_genus)[,"Genus"]

I think it would simply mean adding this line to your function.

(2) I guess you thought about this already, but I was wondering if you could have a chat with joey711 to replace their functions with yours and improve the speed of phyloseq, because I think that most people use that package and don't know about this alternative.

@mikemc
Copy link
Owner

mikemc commented Jun 17, 2022

Hi @g-antonello, thanks for your comment. There is an important complication in that it cannot always be done, since for certain taxonomies like NCBI and Greengenes, a name is repeated for multiple taxa; e.g. there are multiple 'Clostridium' genera in these taxonomies. I would need to think about how to handle this; I could imagine having a command where that tells tax_glom to try to rename to the genus if possible, and if not to create a unique identifier e.g. "Family>Genus".

Realistically, I don't expect to address this myself soon, as I don't have time for active speedyseq development right now and I think the following is easy enough and has the benefit of making the renaming explicit,

phy_genus <- physeq %>%
  tax_glom("Genus") %>%
  mutate_tax_table(.otu = Genus)

I would consider a pull request (with good explanation and tests for the complicating factor noted above)

(2) I guess you thought about this already, but I was wondering if you could have a chat with joey711 to replace their functions with yours and improve the speed of phyloseq, because I think that most people use that package and don't know about this alternative.

I agree it would be good to incorporate the speed improvements directly into phyloseq so that everyone would benefit by default, but I posted about these some time ago without a response. My impression is that phyloseq is not under active maintenance and development anymore, so I don't expect this to happen anytime soon.

@g-antonello
Copy link
Author

Dear Mike,
Thank you for your suggestions, I will think about suggesting a tweak in a more systematic way.
Giacomo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants