Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider parsing "Untergattung" #126

Closed
Archilegt opened this issue Aug 22, 2022 · 5 comments
Closed

Consider parsing "Untergattung" #126

Archilegt opened this issue Aug 22, 2022 · 5 comments

Comments

@Archilegt
Copy link

Searching in BHL’s full text for “Untergattung” retrieves 8675 publications and searching for “Untergatt.” retrieves 541 publications [22.02.2022].
https://www.biodiversitylibrary.org/search?stype=F&searchTerm=Untergattung#/titles
https://www.biodiversitylibrary.org/search?stype=F&searchTerm=Untergatt.#/titles
I don't know how to visualize total hits in the corpus.

@Archilegt
Copy link
Author

Related:
Names of subgenera don't get parsed if subgen. is included in the scientific name value gnames/gnparser#232
recognizing "species group" or "species complex" suffixes as indicators of infrageneric groupings gnames/gnparser#55

Synergic with:
Use "mihi" to enhance scientific name finding and parsing gnames/gnparser#230

@Archilegt
Copy link
Author

Archilegt commented Aug 22, 2022

Example
Julus (Parastenophyllum) Verhoeff, 1899 [original name]
https://myriatrix.myspecies.info/myriatrix/julus-parastenophyllum

Original string: Gatt. Julus, Untergatt. Parastenophyllum mihi
Source: https://www.biodiversitylibrary.org/page/15115029

Remarks:
Name strings “Julus” and “Parastenophyllum” are recognized.
The styling of the subgenus name in the paper is really bad when compared to that of subgenus Julus (Leptoiulus) on page 199.

Suggested recognition:
Gatt. acts as a starter #optional
Untergatt. acts as a starter and/or connector #could be read and used to generate a field subgenus: Parastenophyllum
mihi acts as terminator #recommended

Suggested result:
Recognized name to be shown in "Scientific Names on this Page" box: Julus (Parastenophyllum)

Original string: Gatt. Julus, Untergatt. Parastenophyllum mihi #similar to comment.
Normalization to canonical form:
short version: Parastenophyllum
full version: Julus (Parastenophyllum) #Parentheses are important here as per article 6.1 of the ZooCode.

If this "German issue" is implemented, we can definitely include it in the Verhoeff paper GNA module.

@dimus
Copy link
Member

dimus commented Aug 23, 2022

I wouls say this is also closer to gnfinder realm. I will move this issue there.

@dimus dimus transferred this issue from gnames/gnparser Aug 23, 2022
@dimus
Copy link
Member

dimus commented Aug 23, 2022

I did run the search for Untergattung through all BHL corpus and found that the word happens quite rare and quite often is not connected to immediate scientific name. A check for the word would significantly decrease efficiency of the seach. Such minor improvements accumulating with time would slow down gnfinder to a halt and make it useless for large data processing.

In case of mihi: we would check for it only if we already know something is a scientific name, so it wont change performance significantly.

@dimus dimus closed this as completed Aug 23, 2022
@dimus
Copy link
Member

dimus commented Aug 25, 2022

Anchor words like Untrgattung will be important for NLP analysis to weed out false positives when a scientific word is ambivalent like Cancer or America.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants