-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create function to check that the taxa listed by tnrs_match_names are in the TOL #31
Comments
Oh, seems I stumbled on this independently (OpenTreeOfLife/opentree#777) |
I had put this for fossils originally, but the case you described in the issue is probably even more widespread/useful to document. Maybe we ought to add this in the "how to use rotl?" FAQ vignette for now.... |
Sounds like a good idea -- and happily I can use the work I'm doing now, including the workaround, as the example. |
Do you have a test case for the fossil taxa @fmichonneau ? I think the new TNRS "flag" column might deal with this? |
Unfortunately, it doesn't seem like it.... Taking the example from the initial example you had reported to OTL: tol_induced_subtree(unlist(ott_id(tnrs_match_names(c("Anas", "Gallus", "Anolis", "Geospiza")))))
Error: HTTP failure: 400
The following OTT ids were not found: [765185, 5295932]. but there is no indication in the taxonomy that these nodes might be missing from the tree (nothing in flags indicate it might be missing): > taxonomy_taxon_info(765185)
$`765185`
$`765185`$is_suppressed
[1] FALSE
$`765185`$tax_sources
$`765185`$tax_sources[[1]]
[1] "ncbi:8835"
$`765185`$tax_sources[[2]]
[1] "worms:148788"
$`765185`$tax_sources[[3]]
[1] "gbif:2498056"
$`765185`$tax_sources[[4]]
[1] "irmng:1105530"
$`765185`$unique_name
[1] "Anas"
$`765185`$synonyms
$`765185`$synonyms[[1]]
[1] "Anus"
$`765185`$synonyms[[2]]
[1] "Anassus"
$`765185`$synonyms[[3]]
[1] "Spatula"
$`765185`$synonyms[[4]]
[1] "Aras"
$`765185`$name
[1] "Anas"
$`765185`$flags
list()
$`765185`$ott_id
[1] 765185
$`765185`$rank
[1] "genus"
attr(,"class")
[1] "taxon_info" and not very useful information from the tol enpoint either: > tol_node_info(765185)
Error: HTTP failure: 400
Could not find any synthetic tree nodes corresponding to the OTT id provided (765185). |
If those taxa are not in the tree, it is because they are not monophyletic in the tree. OT used to return something like "invalid_ids" or "valid_but_not_in_tree" (not those names exactly, but you get the point), but not anymore (because the tree server no longer contains the entire taxonomy, and so cannot distinguish invalid-ids from valid-but-not-monophyletic ids). |
Would it be worth hacking something on our side then? We could check whether the ott ids are in the taxonomy when they are not in the tree to give a more informative error message |
Sounds good for now. I imagine OT will fix this, but no time soon. |
Hmm... looking at this a little more, I think it would be too hackish for us to do. Let's leave as it is, and point to the relevant section of the vignette if needed. |
Hi, I came across this error message after passing a list of 189 plant families to families <- c("Asteraceae", "Poaceae", "Rosaceae", "Fabaceae",
"Salicaceae", "Lamiaceae", "Betulaceae", "Apiaceae",
"Brassicaceae", "Fagaceae", "Cyperaceae", "Pinaceae",
"Ranunculaceae", "Ericaceae", "Caprifoliaceae", "Plantaginaceae",
"Caryophyllaceae", "Polygonaceae", "Boraginaceae", "Rubiaceae",
"Sapindaceae", "Malvaceae", "Scrophulariaceae", "Cactaceae",
"Amaranthaceae", "Oleaceae", "Euphorbiaceae", "Ulmaceae",
"Cupressaceae", "Juncaceae", "Campanulaceae", "Urticaceae",
"Geraniaceae", "Solanaceae", "Grossulariaceae", "Adoxaceae",
"Onagraceae", "Hypericaceae", "Orobanchaceae", "Rhamnaceae",
"Primulaceae", "Crassulaceae", "Cornaceae", "Cistaceae",
"Vitaceae", "Asparagaceae", "Violaceae", "Iridaceae",
"Papaveraceae", "Equisetaceae", "Gentianaceae", "Typhaceae",
"Amaryllidaceae", "Bromeliaceae", "Anacardiaceae", "Dennstaedtiaceae",
"Dryopteridaceae", "Lythraceae", "Elaeagnaceae", "Apocynaceae",
"Convolvulaceae", "Berberidaceae", "Celastraceae", "Orchidaceae",
"Resedaceae", "Cucurbitaceae", "Araliaceae", "Balsaminaceae",
"Cannabaceae", "Rutaceae", "Araceae", "Araucariaceae",
"Santalaceae", "Linaceae", "Platanaceae", "Saxifragaceae",
"Juglandaceae", "Liliaceae", "Haloragaceae", "Tamaricaceae",
"Athyriaceae", "Moraceae", "Taxaceae", "Arecaceae", "Aspleniaceae",
"Lauraceae", "Melanthiaceae", "Plumbaginaceae", "Tropaeolaceae",
"Alismataceae", "Buxaceae", "Hydrocharitaceae", "Zamiaceae",
"Menyanthaceae", "Aquifoliaceae", "Hydrangeaceae", "Myricaceae",
"Polypodiaceae", "Polytrichaceae", "Juncaginaceae", "Nymphaeaceae",
"Polemoniaceae", "Potamogetonaceae", "Sphagnaceae", "Tectariaceae",
"Verbenaceae", "Aizoaceae", "Cystopteridaceae", "Theaceae",
"Asphodelaceae", "Ephedraceae", "Myrtaceae", "Onocleaceae",
"Pteridaceae", "Thymelaeaceae", "Brachytheciaceae", "Capparaceae",
"Ceratophyllaceae", "Cleomaceae", "Cycadaceae", "Oxalidaceae",
"Acanthaceae", "Amblystegiaceae", "Hylocomiaceae", "Loranthaceae",
"Mniaceae", "Zygophyllaceae", "Bignoniaceae", "Blechnaceae",
"Butomaceae", "Dicranaceae", "Magnoliaceae", "Paeoniaceae",
"Piperaceae", "Polygalaceae", "Portulacaceae", "Strelitziaceae",
"Acoraceae", "Basellaceae", "Bryaceae", "Burseraceae",
"Commelinaceae", "Droseraceae", "Ebenaceae", "Lentibulariaceae",
"Musaceae", "Nephrolepidaceae", "Passifloraceae", "Plagiotheciaceae",
"Pontederiaceae", "Pottiaceae", "Ricciaceae", "Salviniaceae",
"Staphyleaceae", "Thelypteridaceae", "Zingiberaceae",
"Altingiaceae", "Anemiaceae", "Annonaceae", "Aristolochiaceae",
"Begoniaceae", "Cannaceae", "Climaciaceae", "Colchicaceae",
"Ditrichaceae", "Elatinaceae", "Gleicheniaceae", "Goodeniaceae",
"Grimmiaceae", "Hamamelidaceae", "Hedwigiaceae", "Heliconiaceae",
"Hypnaceae", "Loasaceae", "Malpighiaceae", "Marchantiaceae",
"Martyniaceae", "Nyctaginaceae", "Pedaliaceae", "Phrymaceae",
"Phytolaccaceae", "Pittosporaceae", "Proteaceae", "Ruppiaceae",
"Sapotaceae", "Schisandraceae", "Sciadopityaceae", "Styracaceae",
"Thuidiaceae")
resolved_names <- tnrs_match_names(families, context_name = "Land plants")
head(resolved_names)
#> search_string unique_name approximate_match ott_id is_synonym flags
#> 1 asteraceae Asteraceae FALSE 46248 FALSE
#> 2 poaceae Poaceae FALSE 508090 FALSE
#> 3 rosaceae Rosaceae FALSE 208036 FALSE
#> 4 fabaceae Fabaceae FALSE 560323 FALSE
#> 5 salicaceae Salicaceae FALSE 530183 FALSE
#> 6 lamiaceae Lamiaceae FALSE 544714 FALSE
#> number_matches
#> 1 1
#> 2 1
#> 3 1
#> 4 1
#> 5 1
#> 6 1
tr <- tol_induced_subtree(ott_ids = ott_id(resolved_names))
#> Error: HTTP failure: 400
#> The following OTT ids were not found: [147029, 473827, 23373, 17704, 601168, 873718, 614459, 367508, 461417, 79118, 99242, 405426, 427298, 195706, 195710, 548799, 5302233, 734781, 947452, 853767, 195711, 737324, 981715, 734790, 216633, 460575, 13254]. BadIdsExceptionopentree.plugins.BadIdsExceptionlist("opentree.plugins.tree_of_life_v3.doInducedSubtree(tree_of_life_v3.java:516)", "opentree.plugins.tree_of_life_v3.induced_subtree(tree_of_life_v3.java:400)", "java.lang.reflect.Method.invoke(Method.java:498)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)",
#> "java.lang.reflect.Method.invoke(Method.java:498)", "org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)") |
Hi @paternogbc. This is an Open Tree issue, not a Would you prefer that such taxa are skipped in the query such that a tree with as many as possible query taxa are present? |
Hi @josephwb, thanks for you reply. |
@josephwb correct me if I'm wrong, but I think last time I looked into it, it was not possible to check a priori whether an ott_id was present in the synthetic tree, and so it's not possible to warn the user until it fails. On the rotl side, I guess we could wrap the call with |
@fmichonneau Individual ott_ids can be queried using node_info. This could be very slow and tedious, but doable. However, it looks like this may be fixed soon. Probably better to have it fixed at Open Tree than hack something together here. |
No description provided.
The text was updated successfully, but these errors were encountered: