Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNR: Unexpected high matching scores despite incomplete match at infraspecific rank #44

Open
Alectoria opened this issue Aug 21, 2015 · 4 comments
Assignees

Comments

@Alectoria
Copy link

Taxa at an infraspecific rank can get high matches in GNR, although the result is just at the species rank or is incorrect:

For example, when supplying 'Cirsium creticum s. triumfetti' this result makes sense:

  1. Cirsium creticum d'Urv. subsp. triumfetti (Lacaita) Werner [ exact canonical match, Score: 0.988 ]
    GBIF Backbone Taxonomy

But these high scores are unexpected:

  1. Cirsium creticum (Lam.) d'Urv. [ exact canonical match, Score: 0.988 ]
    Catalogue of Life

  2. Cirsium creticum (Lam.) d’Urv. [ exact canonical match, Score: 0.988 ]
    GBIF Backbone Taxonomy

  3. Cirsium creticum (Lam.) d’Urv. subsp. creticum [ exact canonical match, Score: 0.988 ]
    GBIF Backbone Taxonomy
    (note 'creticum' vs 'triumfetti' at infraspecific level).

@dimus dimus self-assigned this Aug 21, 2015
@dimus
Copy link
Member

dimus commented Aug 21, 2015

This is quite interesting case which is caused by double problem with gn parser.

In the original string the rank is s. which is currently not recognized by parser as a rank, as a result s. triumfetti goes into a garbage bean and canonical is determined as ``Cirsium creticuminstead ofCirsium creticum triumfetti`.

In found matches d'Urv is the problem -- d' prefix is not recognized as part of author, so everything after it goes to garbage bean as well and we get Cirsium creticum. As the result we compare Cirsium creticum with Cirsium creticum as a canonical -- hence high score.

The solution -- fix these in parser. However I want to investigate how often s. is used as subspecies rank abbreviation.

@dimus
Copy link
Member

dimus commented Aug 21, 2015

name
Copidognathus s. str. ushakovi Sokolov 1952
Tiliqua s. gigas
Aegorhinus s. philippii Kusch.
Cnemidophorus s. barrancorum Zweifel 1959
Puccinellia distans (Jacq.) Parl. s. str.
Pseudobranchus s. histricolus
Spermophilus s. guttatus
Tretaspis seticornis s. (Hisinger )
Comatricha maculata s. lat.
Limnodromus s. hendersoni Rowan 1932
Vicia cracca s. l. L. s. l.
Asplenium adiantum-nigrum L. s. l.
Euconnus s. str. Thomson, 1862
Dicranograptus s. geniculatus
Mirafra s. hoeschi Stresemann 1939
Centropus s. intermedius
Lacerta s. oristanensis
Cuculus s. horsfieldi
Eunicella s. stricta
Gentiana amarella ssp. amarella s. l.
Pinarochroa s. djamdjamensis Neumann 1905
Cymatura s. var. albomaculata
Zygaena s. sarpedon Marten 1957
Trichotoxon (Trichotoxon s. s.) thikensis Verdcourt 1951
Tupaia s. carimatae Miller 1906
Cellana s. oliveri Powell, 1955 E
Morpho s. Le Moult & Real 1965
Leiocephalus s. parasphex Schwartz 1964
Cynoglossus s. browni Chabanaud 1949
Carex ornithopoda ssp. ornithopoda s. l.
Saxifraga biflora auct. non All. s. str.
Limnocardium (T.) s. oviformis
Salamandra s. algira
Judolia s. swainei Hovore 1988
Rosa villosa agg. s. l.
Gnophos Treitschke, 1825 s. str.
Pinarochroa s. schomia Neumann 1905
Pseudemys s. gaigae
Pagodulina s. altilis
Luscinia s. abbotti
Dasypeltis s. scabra
Plethodon s. clemsona (Jocassee )
Vicia sativa s. l. L. s. l.
Globiceps s. sordidus
Prosymna s. bivittata
Clitambonites s. epigonus Opik 1934
Lacerta (Podarcis) s. hieroglyphica
Farcimen s. striatellum
Lanius s. niloticus
Illaenus s. H.
Crepis vesicaria L. s. l.
Alouatta s. puruensis
Cnemidophorus s. occidentalis
Apodemus s. alpinus
Natrix s. clarki
Lecidea asserculorum s. auct.
Jordanita Verity, 1946 s. str.
Metoponorthus s. lucasioides
Diastylis s. stuxbergi
Connivelobus s. str. magnipalpm var. serratisetus
Cladonia tenuis sensu auct. brit. s. lat
Dryopteris filix-mas agg. s. l.-
Monochamus s. scutellatus
Scrobipala s. picta Povolny 1969
Serinus s. umbrosus

@dimus
Copy link
Member

dimus commented Aug 21, 2015

From this random sample I'd say s. almost never used as subsp. abbreviation, so it won't be right to add it to the rank list. If it happens in your database a lot -- I would write a prematch script to convert it to ssp. or subsp. for example.

d' however is a problem that needs to be fixed in parser.

@Alectoria
Copy link
Author

@dimus Ah, that's how it works. I assumed that 's.' was recognized as a rank abbreviation. Now that I know this, I will standardize all rank abbreviations before submission. Thanks for clarifying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants