Skip to content

jarioksa/taxize

 
 

Repository files navigation

taxize

Project Status: Active – The project has reached a stable, usable state and is being actively developed. cran checks Build Status Build status rstudio mirror downloads cran version

taxize allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.

The taxize book => https://taxize.dev

The functions in the package that work with a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes. For example, gnr_resolve uses the Global Names Resolver API to resolve species names. General functions in the package that don't hit a specific API don't have two words separated by an underscore, e.g., classification.

You need API keys for Tropicos, IUCN, and NatureServe.

Currently implemented in taxize

Souce Function prefix API Docs API key
Encylopedia of Life eol link none
Taxonomic Name Resolution Service tnrs none none
Integrated Taxonomic Information Service itis link none
Global Names Resolver gnr link none
Global Names Index gni link none
IUCN Red List iucn link link
Tropicos tp link link
Theplantlist dot org tpl ** none
National Center for Biotechnology Information ncbi none none
CANADENSYS Vascan name search API vascan link none
International Plant Names Index (IPNI) ipni none none
Barcode of Life Data Systems (BOLD) bold link none
National Biodiversity Network (UK) nbn link none
Index Fungorum fg none none
EU BON eubon link none
Index of Names (ION) ion link none
Open Tree of Life (TOL) tol link none
World Register of Marine Species (WoRMS) worms link none
NatureServe natserv link link
Wikipedia wiki link none
Kew's Plants of the World pow none none

**: There are none! We suggest using TPL and TPLck functions in the taxonstand package. We provide two functions to get bulk data: tpl_families and tpl_get.

***: There are none! The function scrapes the web directly.

May be in taxize in the future...

See the datasources tag in the issue tracker


Installation

Stable version from CRAN

install.packages("taxize")

Development version from GitHub

Windows users install Rtools first.

install.packages("remotes")
remotes::install_github("ropensci/taxize")
library('taxize')

Get unique taxonomic identifier from NCBI

Alot of taxize revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it's better to get an identifier that a particular data source knows about, then we can move forth acquiring more fun taxonomic data.

uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))
#> ══  2 queries  ═══════════════
#> ✔  Found:  Chironomus+riparius
#> ✔  Found:  Chaetopteryx
#> ══  Results  ═════════════════
#> 
#> ● Total: 2 
#> ● Found: 2 
#> ● Not Found: 0

Retrieve classifications

Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.

out <- classification(uids)
lapply(out, head)
#> $`315576`
#>                 name         rank     id
#> 1 cellular organisms      no rank 131567
#> 2          Eukaryota superkingdom   2759
#> 3       Opisthokonta      no rank  33154
#> 4            Metazoa      kingdom  33208
#> 5          Eumetazoa      no rank   6072
#> 6          Bilateria      no rank  33213
#> 
#> $`492549`
#>                 name         rank     id
#> 1 cellular organisms      no rank 131567
#> 2          Eukaryota superkingdom   2759
#> 3       Opisthokonta      no rank  33154
#> 4            Metazoa      kingdom  33208
#> 5          Eumetazoa      no rank   6072
#> 6          Bilateria      no rank  33213

Immediate children

Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.

children("Salmo", db = 'ncbi')
#> $Salmo
#>    childtaxa_id                   childtaxa_name childtaxa_rank
#> 1       2705433                     Salmo ghigii        species
#> 2       2304090                  Salmo abanticus        species
#> 3       2126688              Salmo ciscaucasicus        species
#> 4       1509524  Salmo marmoratus x Salmo trutta        species
#> 5       1484545 Salmo cf. cenerinus BOLD:AAB3872        species
#> 6       1483130               Salmo zrmanjaensis        species
#> 7       1483129               Salmo visovacensis        species
#> 8       1483128                Salmo rhodanensis        species
#> 9       1483127                 Salmo pellegrini        species
#> 10      1483126                     Salmo opimus        species
#> 11      1483125                Salmo macedonicus        species
#> 12      1483124                Salmo lourosensis        species
#> 13      1483123                   Salmo labecula        species
#> 14      1483122                  Salmo farioides        species
#> 15      1483121                      Salmo chilo        species
#> 16      1483120                     Salmo cettii        species
#> 17      1483119                  Salmo cenerinus        species
#> 18      1483118                   Salmo aphelios        species
#> 19      1483117                    Salmo akairos        species
#> 20      1201173               Salmo peristericus        species
#> 21      1035833                   Salmo ischchan        species
#> 22       700588                     Salmo labrax        species
#> 23       602068                    Salmo caspius     subspecies
#> 24       237411              Salmo obtusirostris        species
#> 25       235141              Salmo platycephalus        species
#> 26       234793                    Salmo letnica        species
#> 27        62065                  Salmo ohridanus        species
#> 28        33518                 Salmo marmoratus        species
#> 29        33516                    Salmo fibreni        species
#> 30        33515                     Salmo carpio        species
#> 31         8032                     Salmo trutta        species
#> 32         8030                      Salmo salar        species
#> 
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "ncbi"

Downstream children to a rank

Get all species in the genus Apis

downstream(as.tsn(154395), db = 'itis', downto = 'species', mesages = FALSE)
#> $`154395`
#>      tsn parentname parenttsn rankname          taxonname rankid
#> 1 154396       Apis    154395  species     Apis mellifera    220
#> 2 763550       Apis    154395  species Apis andreniformis    220
#> 3 763551       Apis    154395  species        Apis cerana    220
#> 4 763552       Apis    154395  species       Apis dorsata    220
#> 5 763553       Apis    154395  species        Apis florea    220
#> 6 763554       Apis    154395  species Apis koschevnikovi    220
#> 7 763555       Apis    154395  species   Apis nigrocincta    220
#> 
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "itis"

Upstream taxa

Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).

upstream("Pinus contorta", db = 'itis', upto = 'Genus', mesages = FALSE)
#> ══  1 queries  ═══════════════
#> ✔  Found:  Pinus contorta
#> ══  Results  ═════════════════
#> 
#> ● Total: 1 
#> ● Found: 1 
#> ● Not Found: 0
#> $`Pinus contorta`
#>      tsn parentname parenttsn rankname   taxonname rankid
#> 1  18031   Pinaceae     18030    genus       Abies    180
#> 2  18033   Pinaceae     18030    genus       Picea    180
#> 3  18035   Pinaceae     18030    genus       Pinus    180
#> 4 183396   Pinaceae     18030    genus       Tsuga    180
#> 5 183405   Pinaceae     18030    genus      Cedrus    180
#> 6 183409   Pinaceae     18030    genus       Larix    180
#> 7 183418   Pinaceae     18030    genus Pseudotsuga    180
#> 8 822529   Pinaceae     18030    genus  Keteleeria    180
#> 9 822530   Pinaceae     18030    genus Pseudolarix    180
#> 
#> attr(,"class")
#> [1] "upstream"
#> attr(,"db")
#> [1] "itis"

Get synonyms

synonyms("Acer drummondii", db="itis")
#> ══  1 queries  ═══════════════
#> ✔  Found:  Acer drummondii
#> ══  Results  ═════════════════
#> 
#> ● Total: 1 
#> ● Found: 1 
#> ● Not Found: 0
#> $`Acer drummondii`
#>   sub_tsn                    acc_name acc_tsn                    acc_author
#> 1  183671 Acer rubrum var. drummondii  526853 (Hook. & Arn. ex Nutt.) Sarg.
#> 2  183671 Acer rubrum var. drummondii  526853 (Hook. & Arn. ex Nutt.) Sarg.
#> 3  183671 Acer rubrum var. drummondii  526853 (Hook. & Arn. ex Nutt.) Sarg.
#>                          syn_author                    syn_name syn_tsn
#> 1 (Hook. & Arn. ex Nutt.) E. Murray Acer rubrum ssp. drummondii   28730
#> 2             Hook. & Arn. ex Nutt.             Acer drummondii  183671
#> 3     (Hook. & Arn. ex Nutt.) Small          Rufacer drummondii  183672
#> 
#> attr(,"class")
#> [1] "synonyms"
#> attr(,"db")
#> [1] "itis"

Get taxonomic IDs from many sources

get_ids(names="Salvelinus fontinalis", db = c('itis', 'ncbi'), mesages = FALSE)
#> ══  db: itis ═════════════════
#> ══  1 queries  ═══════════════
#> ✔  Found:  Salvelinus fontinalis
#> ══  Results  ═════════════════
#> 
#> ● Total: 1 
#> ● Found: 1 
#> ● Not Found: 0
#> ══  db: ncbi ═════════════════
#> ══  1 queries  ═══════════════
#> ✔  Found:  Salvelinus+fontinalis
#> ══  Results  ═════════════════
#> 
#> ● Total: 1 
#> ● Found: 1 
#> ● Not Found: 0
#> $itis
#> Salvelinus fontinalis 
#>              "162003" 
#> attr(,"class")
#> [1] "tsn"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=162003"
#> 
#> $ncbi
#> Salvelinus fontinalis 
#>                "8038" 
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/8038"
#> 
#> attr(,"class")
#> [1] "ids"

You can limit to certain rows when getting ids in any get_*() functions

get_ids(names="Poa annua", db = "gbif", rows=1)
#> ══  db: gbif ═════════════════
#> ══  1 queries  ═══════════════
#> ✔  Found:  Poa annua
#> ══  Results  ═════════════════
#> 
#> ● Total: 1 
#> ● Found: 1 
#> ● Not Found: 0
#> $gbif
#> Poa annua 
#> "2704179" 
#> attr(,"class")
#> [1] "gbifid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] TRUE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.gbif.org/species/2704179"
#> 
#> attr(,"class")
#> [1] "ids"

Furthermore, you can just back all ids if that's your jam with the get_*_() functions (all get_*() functions with additional _ underscore at end of function name)

get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)
#> ══  db: nbn ══════════════════
#> $nbn
#> $nbn$`Chironomus riparius`
#>               guid      scientificName    rank taxonomicStatus
#> 1 NBNSYS0000027573 Chironomus riparius species        accepted
#> 2 NBNSYS0000023573    Quedius riparius species        accepted
#> 3 NBNSYS0000007169   Elaphrus riparius species        accepted
#> 
#> $nbn$`Pinus contorta`
#>               guid                scientificName    rank taxonomicStatus
#> 1 NBNSYS0000004786                Pinus contorta species        accepted
#> 2 NHMSYS0000494848  Pinus contorta var. contorta variety        accepted
#> 3 NHMSYS0000494858 Pinus contorta var. murrayana variety        accepted
#> 
#> 
#> attr(,"class")
#> [1] "ids"

Common names from scientific names

sci2comm('Helianthus annuus', db = 'itis')
#> ══  1 queries  ═══════════════
#> ✔  Found:  Helianthus annuus
#> ══  Results  ═════════════════
#> 
#> ● Total: 1 
#> ● Found: 1 
#> ● Not Found: 0
#> $`Helianthus annuus`
#> [1] "common sunflower" "sunflower"        "wild sunflower"   "annual sunflower"

Scientific names from common names

comm2sci("black bear", db = "itis")
#> $`black bear`
#> [1] "Ursus americanus luteolus"   "Ursus americanus"           
#> [3] "Ursus americanus"            "Ursus americanus americanus"
#> [5] "Chiropotes satanas"          "Ursus thibetanus"           
#> [7] "Ursus thibetanus"

Lowest common rank among taxa

spp <- c("Sus scrofa", "Homo sapiens", "Nycticebus coucang")
lowest_common(spp, db = "ncbi")
#> ══  3 queries  ═══════════════
#> ✔  Found:  Sus+scrofa
#> ✔  Found:  Homo+sapiens
#> ✔  Found:  Nycticebus+coucang
#> ══  Results  ═════════════════
#> 
#> ● Total: 3 
#> ● Found: 3 
#> ● Not Found: 0
#> ══  3 queries  ═══════════════
#> ✔  Found:  Sus+scrofa
#> ✔  Found:  Homo+sapiens
#> ✔  Found:  Nycticebus+coucang
#> ══  Results  ═════════════════
#> 
#> ● Total: 3 
#> ● Found: 3 
#> ● Not Found: 0
#>             name        rank      id
#> 21 Boreoeutheria below-class 1437010

Coerce codes to taxonomic id classes

numeric to uid

as.uid(315567)
#> [1] "315567"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/315567"

list to uid

as.uid(list("315567", "3339", "9696"))
#> [1] "315567" "3339"   "9696"  
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found" "found" "found"
#> attr(,"multiple_matches")
#> [1] FALSE FALSE FALSE
#> attr(,"pattern_match")
#> [1] FALSE FALSE FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/315567"
#> [2] "https://www.ncbi.nlm.nih.gov/taxonomy/3339"  
#> [3] "https://www.ncbi.nlm.nih.gov/taxonomy/9696"

Coerce taxonomic id classes to a data.frame

out <- as.uid(c(315567, 3339, 9696))
(res <- data.frame(out))
#>      ids class match multiple_matches pattern_match
#> 1 315567   uid found            FALSE         FALSE
#> 2   3339   uid found            FALSE         FALSE
#> 3   9696   uid found            FALSE         FALSE
#>                                            uri
#> 1 https://www.ncbi.nlm.nih.gov/taxonomy/315567
#> 2   https://www.ncbi.nlm.nih.gov/taxonomy/3339
#> 3   https://www.ncbi.nlm.nih.gov/taxonomy/9696

Screencast

Contributing

See our CONTRIBUTING document.

Road map

Check out our milestones to see what we plan to get done for each version.

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for taxize in R doing citation(package = 'taxize')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

rofooter

About

Search web taxonomy sites and download data

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 99.8%
  • Makefile 0.2%