-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GNR: Unexpected high matching scores despite incomplete match at infraspecific rank #44
Comments
This is quite interesting case which is caused by double problem with gn parser. In the original string the rank is In found matches d'Urv is the problem -- d' prefix is not recognized as part of author, so everything after it goes to garbage bean as well and we get The solution -- fix these in parser. However I want to investigate how often s. is used as subspecies rank abbreviation. |
|
From this random sample I'd say
|
@dimus Ah, that's how it works. I assumed that 's.' was recognized as a rank abbreviation. Now that I know this, I will standardize all rank abbreviations before submission. Thanks for clarifying! |
Taxa at an infraspecific rank can get high matches in GNR, although the result is just at the species rank or is incorrect:
For example, when supplying 'Cirsium creticum s. triumfetti' this result makes sense:
GBIF Backbone Taxonomy
But these high scores are unexpected:
Cirsium creticum (Lam.) d'Urv. [ exact canonical match, Score: 0.988 ]
Catalogue of Life
Cirsium creticum (Lam.) d’Urv. [ exact canonical match, Score: 0.988 ]
GBIF Backbone Taxonomy
Cirsium creticum (Lam.) d’Urv. subsp. creticum [ exact canonical match, Score: 0.988 ]
GBIF Backbone Taxonomy
(note 'creticum' vs 'triumfetti' at infraspecific level).
The text was updated successfully, but these errors were encountered: