Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create function to check that the taxa listed by tnrs_match_names are in the TOL #31

Open
fmichonneau opened this issue Jun 20, 2015 · 14 comments

Comments

@fmichonneau
Copy link
Member

No description provided.

@dwinter
Copy link
Member

dwinter commented Oct 1, 2015

Oh, seems I stumbled on this independently (OpenTreeOfLife/opentree#777)

@fmichonneau
Copy link
Member Author

I had put this for fossils originally, but the case you described in the issue is probably even more widespread/useful to document. Maybe we ought to add this in the "how to use rotl?" FAQ vignette for now....

@dwinter
Copy link
Member

dwinter commented Oct 1, 2015

Sounds like a good idea -- and happily I can use the work I'm doing now, including the workaround, as the example.

@dwinter
Copy link
Member

dwinter commented Apr 19, 2016

Do you have a test case for the fossil taxa @fmichonneau ? I think the new TNRS "flag" column might deal with this?

@fmichonneau
Copy link
Member Author

Unfortunately, it doesn't seem like it.... Taking the example from the initial example you had reported to OTL:

tol_induced_subtree(unlist(ott_id(tnrs_match_names(c("Anas", "Gallus", "Anolis", "Geospiza")))))
Error: HTTP failure: 400
The following OTT ids were not found: [765185, 5295932]. 

but there is no indication in the taxonomy that these nodes might be missing from the tree (nothing in flags indicate it might be missing):

> taxonomy_taxon_info(765185)
$`765185`
$`765185`$is_suppressed
[1] FALSE

$`765185`$tax_sources
$`765185`$tax_sources[[1]]
[1] "ncbi:8835"

$`765185`$tax_sources[[2]]
[1] "worms:148788"

$`765185`$tax_sources[[3]]
[1] "gbif:2498056"

$`765185`$tax_sources[[4]]
[1] "irmng:1105530"


$`765185`$unique_name
[1] "Anas"

$`765185`$synonyms
$`765185`$synonyms[[1]]
[1] "Anus"

$`765185`$synonyms[[2]]
[1] "Anassus"

$`765185`$synonyms[[3]]
[1] "Spatula"

$`765185`$synonyms[[4]]
[1] "Aras"


$`765185`$name
[1] "Anas"

$`765185`$flags
list()

$`765185`$ott_id
[1] 765185

$`765185`$rank
[1] "genus"


attr(,"class")
[1] "taxon_info"

and not very useful information from the tol enpoint either:

> tol_node_info(765185)
Error: HTTP failure: 400
Could not find any synthetic tree nodes corresponding to the OTT id provided (765185).

@josephwb
Copy link
Contributor

josephwb commented Apr 19, 2016

If those taxa are not in the tree, it is because they are not monophyletic in the tree. OT used to return something like "invalid_ids" or "valid_but_not_in_tree" (not those names exactly, but you get the point), but not anymore (because the tree server no longer contains the entire taxonomy, and so cannot distinguish invalid-ids from valid-but-not-monophyletic ids).

@fmichonneau
Copy link
Member Author

Would it be worth hacking something on our side then? We could check whether the ott ids are in the taxonomy when they are not in the tree to give a more informative error message

@josephwb
Copy link
Contributor

Sounds good for now. I imagine OT will fix this, but no time soon.

@fmichonneau
Copy link
Member Author

Hmm... looking at this a little more, I think it would be too hackish for us to do. Let's leave as it is, and point to the relevant section of the vignette if needed.

@paternogbc
Copy link

Hi,
Thanks a lot for developing this very nice R package ;)

I came across this error message after passing a list of 189 plant families to tnrs_match_names().
(reproducible example below, sorry if it is too).
No warnings from tnrs_match_names()

families <- c("Asteraceae", "Poaceae", "Rosaceae", "Fabaceae", 
    "Salicaceae", "Lamiaceae", "Betulaceae", "Apiaceae", 
    "Brassicaceae", "Fagaceae", "Cyperaceae", "Pinaceae", 
    "Ranunculaceae", "Ericaceae", "Caprifoliaceae", "Plantaginaceae", 
    "Caryophyllaceae", "Polygonaceae", "Boraginaceae", "Rubiaceae", 
    "Sapindaceae", "Malvaceae", "Scrophulariaceae", "Cactaceae", 
    "Amaranthaceae", "Oleaceae", "Euphorbiaceae", "Ulmaceae", 
    "Cupressaceae", "Juncaceae", "Campanulaceae", "Urticaceae", 
    "Geraniaceae", "Solanaceae", "Grossulariaceae", "Adoxaceae", 
    "Onagraceae", "Hypericaceae", "Orobanchaceae", "Rhamnaceae", 
    "Primulaceae", "Crassulaceae", "Cornaceae", "Cistaceae", 
    "Vitaceae", "Asparagaceae", "Violaceae", "Iridaceae", 
    "Papaveraceae", "Equisetaceae", "Gentianaceae", "Typhaceae", 
    "Amaryllidaceae", "Bromeliaceae", "Anacardiaceae", "Dennstaedtiaceae", 
    "Dryopteridaceae", "Lythraceae", "Elaeagnaceae", "Apocynaceae", 
    "Convolvulaceae", "Berberidaceae", "Celastraceae", "Orchidaceae", 
    "Resedaceae", "Cucurbitaceae", "Araliaceae", "Balsaminaceae", 
    "Cannabaceae", "Rutaceae", "Araceae", "Araucariaceae", 
    "Santalaceae", "Linaceae", "Platanaceae", "Saxifragaceae", 
    "Juglandaceae", "Liliaceae", "Haloragaceae", "Tamaricaceae", 
    "Athyriaceae", "Moraceae", "Taxaceae", "Arecaceae", "Aspleniaceae", 
    "Lauraceae", "Melanthiaceae", "Plumbaginaceae", "Tropaeolaceae", 
    "Alismataceae", "Buxaceae", "Hydrocharitaceae", "Zamiaceae", 
    "Menyanthaceae", "Aquifoliaceae", "Hydrangeaceae", "Myricaceae", 
    "Polypodiaceae", "Polytrichaceae", "Juncaginaceae", "Nymphaeaceae", 
    "Polemoniaceae", "Potamogetonaceae", "Sphagnaceae", "Tectariaceae", 
    "Verbenaceae", "Aizoaceae", "Cystopteridaceae", "Theaceae", 
    "Asphodelaceae", "Ephedraceae", "Myrtaceae", "Onocleaceae", 
    "Pteridaceae", "Thymelaeaceae", "Brachytheciaceae", "Capparaceae", 
    "Ceratophyllaceae", "Cleomaceae", "Cycadaceae", "Oxalidaceae", 
    "Acanthaceae", "Amblystegiaceae", "Hylocomiaceae", "Loranthaceae", 
    "Mniaceae", "Zygophyllaceae", "Bignoniaceae", "Blechnaceae", 
    "Butomaceae", "Dicranaceae", "Magnoliaceae", "Paeoniaceae", 
    "Piperaceae", "Polygalaceae", "Portulacaceae", "Strelitziaceae", 
    "Acoraceae", "Basellaceae", "Bryaceae", "Burseraceae", 
    "Commelinaceae", "Droseraceae", "Ebenaceae", "Lentibulariaceae", 
    "Musaceae", "Nephrolepidaceae", "Passifloraceae", "Plagiotheciaceae", 
    "Pontederiaceae", "Pottiaceae", "Ricciaceae", "Salviniaceae", 
    "Staphyleaceae", "Thelypteridaceae", "Zingiberaceae", 
    "Altingiaceae", "Anemiaceae", "Annonaceae", "Aristolochiaceae", 
    "Begoniaceae", "Cannaceae", "Climaciaceae", "Colchicaceae", 
    "Ditrichaceae", "Elatinaceae", "Gleicheniaceae", "Goodeniaceae", 
    "Grimmiaceae", "Hamamelidaceae", "Hedwigiaceae", "Heliconiaceae", 
    "Hypnaceae", "Loasaceae", "Malpighiaceae", "Marchantiaceae", 
    "Martyniaceae", "Nyctaginaceae", "Pedaliaceae", "Phrymaceae", 
    "Phytolaccaceae", "Pittosporaceae", "Proteaceae", "Ruppiaceae", 
    "Sapotaceae", "Schisandraceae", "Sciadopityaceae", "Styracaceae", 
    "Thuidiaceae")
resolved_names <- tnrs_match_names(families, context_name = "Land plants")
head(resolved_names)
#>   search_string unique_name approximate_match ott_id is_synonym flags
#> 1    asteraceae  Asteraceae             FALSE  46248      FALSE      
#> 2       poaceae     Poaceae             FALSE 508090      FALSE      
#> 3      rosaceae    Rosaceae             FALSE 208036      FALSE      
#> 4      fabaceae    Fabaceae             FALSE 560323      FALSE      
#> 5    salicaceae  Salicaceae             FALSE 530183      FALSE      
#> 6     lamiaceae   Lamiaceae             FALSE 544714      FALSE      
#>   number_matches
#> 1              1
#> 2              1
#> 3              1
#> 4              1
#> 5              1
#> 6              1
tr <- tol_induced_subtree(ott_ids = ott_id(resolved_names))
#> Error: HTTP failure: 400
#> The following OTT ids were not found: [147029, 473827, 23373, 17704, 601168, 873718, 614459, 367508, 461417, 79118, 99242, 405426, 427298, 195706, 195710, 548799, 5302233, 734781, 947452, 853767, 195711, 737324, 981715, 734790, 216633, 460575, 13254]. BadIdsExceptionopentree.plugins.BadIdsExceptionlist("opentree.plugins.tree_of_life_v3.doInducedSubtree(tree_of_life_v3.java:516)", "opentree.plugins.tree_of_life_v3.induced_subtree(tree_of_life_v3.java:400)", "java.lang.reflect.Method.invoke(Method.java:498)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)", 
#>     "java.lang.reflect.Method.invoke(Method.java:498)", "org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)")

@josephwb
Copy link
Contributor

josephwb commented Jul 28, 2016

Hi @paternogbc.

This is an Open Tree issue, not a rotl issue. As mentioned above and on the linked page, if any ott_id is not matched, an error is returned. The reason an ott_id is not matched is because 1) it is invalid or 2) it is not monophyletic in the synthetic tree (i.e. the tree does not pass through the taxon, so an induced tree cannot be returned).

Would you prefer that such taxa are skipped in the query such that a tree with as many as possible query taxa are present?

@paternogbc
Copy link

Hi @josephwb,

thanks for you reply.
Yes, I would say that skipping 'invalid' taxa + printing a more specific/detailed warning about which/why taxa were dropped will be very useful. Perhaps including a short note on the documentation explaining the issue might also help.

@fmichonneau
Copy link
Member Author

@josephwb correct me if I'm wrong, but I think last time I looked into it, it was not possible to check a priori whether an ott_id was present in the synthetic tree, and so it's not possible to warn the user until it fails.

On the rotl side, I guess we could wrap the call with try(), and if it fails retrieve the missing ott_ids from the error message, remove them from the query, and ask for the tree without them. A little clunky but maybe less surprising to users.

@josephwb
Copy link
Contributor

@fmichonneau Individual ott_ids can be queried using node_info. This could be very slow and tedious, but doable.

However, it looks like this may be fixed soon. Probably better to have it fixed at Open Tree than hack something together here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants