taxize
allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.
The taxize book => https://taxize.dev
The functions in the package that work with a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes
. For example, gnr_resolve
uses the Global Names Resolver API to resolve species names. General functions in the package that don't hit a specific API don't have two words separated by an underscore, e.g., classification
.
You need API keys for Tropicos, IUCN, and NatureServe.
Souce | Function prefix | API Docs | API key |
---|---|---|---|
Encylopedia of Life | eol |
link | none |
Taxonomic Name Resolution Service | tnrs |
none | none |
Integrated Taxonomic Information Service | itis |
link | none |
Global Names Resolver | gnr |
link | none |
Global Names Index | gni |
link | none |
IUCN Red List | iucn |
link | link |
Tropicos | tp |
link | link |
Theplantlist dot org | tpl |
** | none |
National Center for Biotechnology Information | ncbi |
none | none |
CANADENSYS Vascan name search API | vascan |
link | none |
International Plant Names Index (IPNI) | ipni |
none | none |
Barcode of Life Data Systems (BOLD) | bold |
link | none |
National Biodiversity Network (UK) | nbn |
link | none |
Index Fungorum | fg |
none | none |
EU BON | eubon |
link | none |
Index of Names (ION) | ion |
link | none |
Open Tree of Life (TOL) | tol |
link | none |
World Register of Marine Species (WoRMS) | worms |
link | none |
NatureServe | natserv |
link | link |
Wikipedia | wiki |
link | none |
Kew's Plants of the World | pow |
none | none |
**: There are none! We suggest using TPL
and TPLck
functions in the taxonstand package. We provide two functions to get bulk data: tpl_families
and tpl_get
.
***: There are none! The function scrapes the web directly.
See the datasources tag in the issue tracker
install.packages("taxize")
Windows users install Rtools first.
install.packages("remotes")
remotes::install_github("ropensci/taxize")
library('taxize')
Alot of taxize
revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it's better to get an identifier that a particular data source knows about, then we can move forth acquiring more fun taxonomic data.
uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))
#> ══ 2 queries ═══════════════
#> ✔ Found: Chironomus+riparius
#> ✔ Found: Chaetopteryx
#> ══ Results ═════════════════
#>
#> ● Total: 2
#> ● Found: 2
#> ● Not Found: 0
Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.
out <- classification(uids)
lapply(out, head)
#> $`315576`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
#>
#> $`492549`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.
children("Salmo", db = 'ncbi')
#> $Salmo
#> childtaxa_id childtaxa_name childtaxa_rank
#> 1 2705433 Salmo ghigii species
#> 2 2304090 Salmo abanticus species
#> 3 2126688 Salmo ciscaucasicus species
#> 4 1509524 Salmo marmoratus x Salmo trutta species
#> 5 1484545 Salmo cf. cenerinus BOLD:AAB3872 species
#> 6 1483130 Salmo zrmanjaensis species
#> 7 1483129 Salmo visovacensis species
#> 8 1483128 Salmo rhodanensis species
#> 9 1483127 Salmo pellegrini species
#> 10 1483126 Salmo opimus species
#> 11 1483125 Salmo macedonicus species
#> 12 1483124 Salmo lourosensis species
#> 13 1483123 Salmo labecula species
#> 14 1483122 Salmo farioides species
#> 15 1483121 Salmo chilo species
#> 16 1483120 Salmo cettii species
#> 17 1483119 Salmo cenerinus species
#> 18 1483118 Salmo aphelios species
#> 19 1483117 Salmo akairos species
#> 20 1201173 Salmo peristericus species
#> 21 1035833 Salmo ischchan species
#> 22 700588 Salmo labrax species
#> 23 602068 Salmo caspius subspecies
#> 24 237411 Salmo obtusirostris species
#> 25 235141 Salmo platycephalus species
#> 26 234793 Salmo letnica species
#> 27 62065 Salmo ohridanus species
#> 28 33518 Salmo marmoratus species
#> 29 33516 Salmo fibreni species
#> 30 33515 Salmo carpio species
#> 31 8032 Salmo trutta species
#> 32 8030 Salmo salar species
#>
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "ncbi"
Get all species in the genus Apis
downstream(as.tsn(154395), db = 'itis', downto = 'species', mesages = FALSE)
#> $`154395`
#> tsn parentname parenttsn rankname taxonname rankid
#> 1 154396 Apis 154395 species Apis mellifera 220
#> 2 763550 Apis 154395 species Apis andreniformis 220
#> 3 763551 Apis 154395 species Apis cerana 220
#> 4 763552 Apis 154395 species Apis dorsata 220
#> 5 763553 Apis 154395 species Apis florea 220
#> 6 763554 Apis 154395 species Apis koschevnikovi 220
#> 7 763555 Apis 154395 species Apis nigrocincta 220
#>
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "itis"
Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).
upstream("Pinus contorta", db = 'itis', upto = 'Genus', mesages = FALSE)
#> ══ 1 queries ═══════════════
#> ✔ Found: Pinus contorta
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> $`Pinus contorta`
#> tsn parentname parenttsn rankname taxonname rankid
#> 1 18031 Pinaceae 18030 genus Abies 180
#> 2 18033 Pinaceae 18030 genus Picea 180
#> 3 18035 Pinaceae 18030 genus Pinus 180
#> 4 183396 Pinaceae 18030 genus Tsuga 180
#> 5 183405 Pinaceae 18030 genus Cedrus 180
#> 6 183409 Pinaceae 18030 genus Larix 180
#> 7 183418 Pinaceae 18030 genus Pseudotsuga 180
#> 8 822529 Pinaceae 18030 genus Keteleeria 180
#> 9 822530 Pinaceae 18030 genus Pseudolarix 180
#>
#> attr(,"class")
#> [1] "upstream"
#> attr(,"db")
#> [1] "itis"
synonyms("Acer drummondii", db="itis")
#> ══ 1 queries ═══════════════
#> ✔ Found: Acer drummondii
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> $`Acer drummondii`
#> sub_tsn acc_name acc_tsn acc_author
#> 1 183671 Acer rubrum var. drummondii 526853 (Hook. & Arn. ex Nutt.) Sarg.
#> 2 183671 Acer rubrum var. drummondii 526853 (Hook. & Arn. ex Nutt.) Sarg.
#> 3 183671 Acer rubrum var. drummondii 526853 (Hook. & Arn. ex Nutt.) Sarg.
#> syn_author syn_name syn_tsn
#> 1 (Hook. & Arn. ex Nutt.) E. Murray Acer rubrum ssp. drummondii 28730
#> 2 Hook. & Arn. ex Nutt. Acer drummondii 183671
#> 3 (Hook. & Arn. ex Nutt.) Small Rufacer drummondii 183672
#>
#> attr(,"class")
#> [1] "synonyms"
#> attr(,"db")
#> [1] "itis"
get_ids(names="Salvelinus fontinalis", db = c('itis', 'ncbi'), mesages = FALSE)
#> ══ db: itis ═════════════════
#> ══ 1 queries ═══════════════
#> ✔ Found: Salvelinus fontinalis
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> ══ db: ncbi ═════════════════
#> ══ 1 queries ═══════════════
#> ✔ Found: Salvelinus+fontinalis
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> $itis
#> Salvelinus fontinalis
#> "162003"
#> attr(,"class")
#> [1] "tsn"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=162003"
#>
#> $ncbi
#> Salvelinus fontinalis
#> "8038"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/8038"
#>
#> attr(,"class")
#> [1] "ids"
You can limit to certain rows when getting ids in any get_*()
functions
get_ids(names="Poa annua", db = "gbif", rows=1)
#> ══ db: gbif ═════════════════
#> ══ 1 queries ═══════════════
#> ✔ Found: Poa annua
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> $gbif
#> Poa annua
#> "2704179"
#> attr(,"class")
#> [1] "gbifid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] TRUE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.gbif.org/species/2704179"
#>
#> attr(,"class")
#> [1] "ids"
Furthermore, you can just back all ids if that's your jam with the get_*_()
functions (all get_*()
functions with additional _
underscore at end of function name)
get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)
#> ══ db: nbn ══════════════════
#> $nbn
#> $nbn$`Chironomus riparius`
#> guid scientificName rank taxonomicStatus
#> 1 NBNSYS0000027573 Chironomus riparius species accepted
#> 2 NBNSYS0000023573 Quedius riparius species accepted
#> 3 NBNSYS0000007169 Elaphrus riparius species accepted
#>
#> $nbn$`Pinus contorta`
#> guid scientificName rank taxonomicStatus
#> 1 NBNSYS0000004786 Pinus contorta species accepted
#> 2 NHMSYS0000494848 Pinus contorta var. contorta variety accepted
#> 3 NHMSYS0000494858 Pinus contorta var. murrayana variety accepted
#>
#>
#> attr(,"class")
#> [1] "ids"
sci2comm('Helianthus annuus', db = 'itis')
#> ══ 1 queries ═══════════════
#> ✔ Found: Helianthus annuus
#> ══ Results ═════════════════
#>
#> ● Total: 1
#> ● Found: 1
#> ● Not Found: 0
#> $`Helianthus annuus`
#> [1] "common sunflower" "sunflower" "wild sunflower" "annual sunflower"
comm2sci("black bear", db = "itis")
#> $`black bear`
#> [1] "Ursus americanus luteolus" "Ursus americanus"
#> [3] "Ursus americanus" "Ursus americanus americanus"
#> [5] "Chiropotes satanas" "Ursus thibetanus"
#> [7] "Ursus thibetanus"
spp <- c("Sus scrofa", "Homo sapiens", "Nycticebus coucang")
lowest_common(spp, db = "ncbi")
#> ══ 3 queries ═══════════════
#> ✔ Found: Sus+scrofa
#> ✔ Found: Homo+sapiens
#> ✔ Found: Nycticebus+coucang
#> ══ Results ═════════════════
#>
#> ● Total: 3
#> ● Found: 3
#> ● Not Found: 0
#> ══ 3 queries ═══════════════
#> ✔ Found: Sus+scrofa
#> ✔ Found: Homo+sapiens
#> ✔ Found: Nycticebus+coucang
#> ══ Results ═════════════════
#>
#> ● Total: 3
#> ● Found: 3
#> ● Not Found: 0
#> name rank id
#> 21 Boreoeutheria below-class 1437010
numeric
to uid
as.uid(315567)
#> [1] "315567"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/315567"
list
to uid
as.uid(list("315567", "3339", "9696"))
#> [1] "315567" "3339" "9696"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found" "found" "found"
#> attr(,"multiple_matches")
#> [1] FALSE FALSE FALSE
#> attr(,"pattern_match")
#> [1] FALSE FALSE FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/315567"
#> [2] "https://www.ncbi.nlm.nih.gov/taxonomy/3339"
#> [3] "https://www.ncbi.nlm.nih.gov/taxonomy/9696"
out <- as.uid(c(315567, 3339, 9696))
(res <- data.frame(out))
#> ids class match multiple_matches pattern_match
#> 1 315567 uid found FALSE FALSE
#> 2 3339 uid found FALSE FALSE
#> 3 9696 uid found FALSE FALSE
#> uri
#> 1 https://www.ncbi.nlm.nih.gov/taxonomy/315567
#> 2 https://www.ncbi.nlm.nih.gov/taxonomy/3339
#> 3 https://www.ncbi.nlm.nih.gov/taxonomy/9696
See our CONTRIBUTING document.
Check out our milestones to see what we plan to get done for each version.
- Please report any issues or bugs.
- License: MIT
- Get citation information for
taxize
in R doingcitation(package = 'taxize')
- Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.