-
Notifications
You must be signed in to change notification settings - Fork 4
How does it work?
Global Names Resolver is a powerful API that uses specialised fuzzy-matching algorithms to search taxonomic datasources across the web. TaxonNamesResovler queries these datasources through the programming language python.
TaxonNamesResolver first searches all the names (in chunks of 100) against the main datasource (search 1). Names that fail to be resolved are then searched against other datasources (search 2) to find synonyms. If any synonyms are returned, these are searched against the main datasource (search 3). Names that remain unresolved, are reduced to their genus name, and these are again searched as above (searches 4 to 6).
For returned names with multiple records, the most likely match is found by testing if the name is in the correct clade (as specified by the user), if the name has the highest GNR score and/or the name at the lowest taxonomic level.
The resolved names are then written to a .csv file.
Taxonomy is full of synonyms; to avoid returning the wrong name it is best to specify a parent clade ID. For example, if you know all the taxon names in your list are mammals, then any names matched that are not a mammal must be incorrect and TaxonNamesResolver will remove them. When searching NCBI as the main datasource, use NCBI taxonomy to search for and find the ID of your parent clade. (For mammals it would be 40674.)
The GNR API returns JSON files after searching across datasources. To keep the program transparent, every search carried out be TaxonNamesResolver is saved in the 'resolved_names' folder. These files can be viewed in a web browser with the appropriate add-on.