Skip to content
This repository has been archived by the owner on Apr 15, 2020. It is now read-only.

Whitakers: handle alternate spellings of principal parts #17

Closed
balmas opened this issue Aug 8, 2018 · 12 comments
Closed

Whitakers: handle alternate spellings of principal parts #17

balmas opened this issue Aug 8, 2018 · 12 comments
Assignees
Labels

Comments

@balmas
Copy link
Member

balmas commented Aug 8, 2018

See alpheios-project/morphsvc#3

If a parser returns multiple dict elements and a single mean element should the mean be applied to both? Can we recover from this parser error?

@balmas
Copy link
Member Author

balmas commented Aug 13, 2018

Test cases for this:

aberis
adero
adjuvo (adiuvabo, adiuvante,...)
alo ( 'alitus' vs 'altus' in principal parts)
amicio ('amixi' vs 'amicui' in principal parts)
apta
auxilio
beatricem (trico vs tricor)
blandiatur (blandio vs blandior)
caedo (caecidi vs cacidi)
cape (here we have some garbage in one of the hdwds "capio, capere, additional, forms")
clave, claves, clavis
comedo (comessus vs comestus vs comesus)
como
commoraris (commoro vs commoror)
congredior
contemplur (contemplo vs contemplor)
coque (coquos vs coquus)
criminati (crimino vs criminor)
cunctor (cunctor; cunctari; cunctatus vs cuncto; cunctare; cunctavi; cunctatus)
desino (desino; desinere; desivi; desitus vs desino; desinare; desavi; desatus)
duco (some garbage in one of the hdwds "duco; ducere; additional; forms")
edo (essus vs esus)
emere (emereo vs emereor)
excurro (excurro; excurrere; excucurri; excursus vs excurro; excurrere; excurri; excursus)
felem (felis vs feles)
grammaticae
ibis
imitandum (imito/imitor)
industrius (industriior vs industrior)
inferus
insuper
iocari (joco vs jocor)
itinera (itiner vs itiner)
lacrimante (lacrimo vs lacrimor)
lactis (lac vs lact)
lamentari (lamento/lamentor)
latrina (latrina/latrinum)
lavo (lavatus v lautos v lotus)
merendam (mereo vs mereor)
mille (millis vs milis)
misereror (misereo vs misereror vs miseret)
obsonatum (obsono vs obsonor)
odi (odeo vs odio)
ostendere (ostendo vs ostendeo)
pantheum (vs pantheom)
physicae (a big mess)
poto (potatus vs potus)
pradium (prandii vs prandi(i)
prodito (prodo vs prodeo)
promo (prompsi vs promsi)
pungo (pupugi vs pepgui)
quasi
salit (salo vs saleo)
scio (scivi vs scivi(ii)
scrutari (scruto vs scrutor)
septimia (septim vs septem)
sicut (adv vs conjunction)
spondeo (spopondi vs spepondi)
tueor (tuitus vs tutus)
vello (volsi vs velli)

@balmas
Copy link
Member Author

balmas commented Aug 13, 2018

Looking more closely at the whitakers output, it seems that most, if not all of these are due to differing spellings of the principal parts. So we can apply the same meaning to all of them. I guess we need to allow for multiple variations on spellings of principal parts, aggregated in one entry. E.g. here is how we treated it in V1:

screenshot from 2018-08-13 09-17-55

And what we are currently doing in V2
screenshot from 2018-08-13 09-10-41

@balmas balmas changed the title handle single mean with multiple dict? Whitakers: handle alternate spellings of principal parts Aug 16, 2018
balmas pushed a commit that referenced this issue Aug 16, 2018
@balmas
Copy link
Member Author

balmas commented Aug 16, 2018

started work on this. Still to be done: when aggregating lemmas for a lexeme, make sure the lemma that is assigned as the primary lemma is the most frequent one.

balmas pushed a commit that referenced this issue Aug 24, 2018
needed for #17 - lemma variations often come from different sources
@balmas balmas assigned monzug and unassigned balmas Aug 24, 2018
@balmas
Copy link
Member Author

balmas commented Aug 24, 2018

balmas pushed a commit that referenced this issue Aug 28, 2018
lemmas with different age can be aggregated
balmas added a commit that referenced this issue Aug 28, 2018
@balmas
Copy link
Member Author

balmas commented Aug 28, 2018

@monzug this can also now be tested in https://github.com/alpheios-project/webextension/tree/qa-2.0.3-3

@monzug
Copy link

monzug commented Sep 13, 2018

tested in Chrome in build 2.0.3-5.
same of the above examples have been fixed such as aberis or pungo.
Others (mille, poto, clavis, coque) could still be merged as they look like alternative spelling
Others (apta, cape, desino) have different meaning or different conjugations, so they look ok to me.

@monzug monzug assigned balmas and unassigned monzug Sep 13, 2018
@monzug
Copy link

monzug commented Sep 13, 2018

Bridget, giving back to you.

@balmas
Copy link
Member Author

balmas commented Sep 13, 2018

yeah, this fix only fixes some of the scenarios. I wasn't sure if all of the words listed above fell into this category. Most do, as you have noted some do not. There are issues on the morphsvc which describe some of the other scenarios I found:
alpheios-project/morphsvc#4
alpheios-project/morphsvc#6
alpheios-project/morphsvc#7
alpheios-project/morphsvc#8
alpheios-project/morphsvc#9
alpheios-project/morphsvc#10

Some of these may be problems with the original Whitaker's source code, and some are problems with our wordsxml wrapper on top of it. This fix addresses the scenario where our wordsxml wrapper puts more than one dict entry in a single lexical entry element, gives a single mean and the only difference between the dict entry are in the principal parts, source, age and/or frequency.

There is only so much normalize I can (and really should) do on the client side here. We will have to decide if we are going to open up the old Ada code or find a new parser to fix all of them.

@monzug
Copy link

monzug commented Sep 14, 2018

let me know if you want the list of which word has been fixed, which one doesn't look like it could be fixed, and the one that might be merged.

@balmas
Copy link
Member Author

balmas commented Sep 14, 2018

yes that would be great. thanks!

@balmas balmas assigned monzug and unassigned balmas Sep 17, 2018
@monzug
Copy link

monzug commented Sep 17, 2018

here we are. I added a number next to each word

  1. fixed
  2. can be fixed
  3. different conjugation or meaning or other, do not need to be fixed

aberis 1
adero 1
adjuvo (adiuvabo, adiuvante,...) 1
alo ( 'alitus' vs 'altus' in principal parts) 1
amicio ('amixi' vs 'amicui' in principal parts) 1
apta 3
auxilio 3
beatricem (trico vs tricor) 3
blandiatur (blandio vs blandior) 3
caedo (caecidi vs cacidi) 1
cape (here we have some garbage in one of the hdwds "capio, capere, additional, forms") 3
clave, claves, clavis 2
comedo (comessus vs comestus vs comesus) 1
como 1
commoraris (commoro vs commoror) 2
congredior 3
contemplur (contemplo vs contemplor) 3
coque (coquos vs coquus) 2
criminati (crimino vs criminor) 3
cunctor (cuncto, cunctari; cunctatus vs cuncto; cunctare; cunctavi; cunctatus) 3
desino (desino; desinere; desivi; desitus vs desino; desinare; desavi; desatus) 3
duco (some garbage in one of the hdwds "duco; ducere; additional; forms") 1
edo (essus vs esus) 1
emere (emereo vs emereor) 1
excurro (excurro; excurrere; excucurri; excursus vs excurro; excurrere; excurri; excursus) 1
felem (felis vs feles) 2
grammaticae 2
ibis 2
imitandum (imito/imitor) 1
industrius (industriior vs industrior ) 2
inferus 3
insuper 2
iocari (joco vs jocor) 1
itinera (itiner vs itiner) 1
lacrimante (lacrimo vs lacrimor) 1
lactis (lac vs lact) 1
lamentari (lamento/lamentor) 1
latrina (latrina/latrinum) 3
lavo (lavatus v lautos v lotus) 1
merendam (mereo vs mereor) 1
mille (millis vs milis) 2
misereror (misereo vs misereror vs miseret) 2
obsonatum (obsono vs obsonor) 1
odi (odeo vs odio) 3
ostendere (ostendo vs ostendeo) 1
pantheum (vs pantheom) 2
physicae (a big mess) 2 ---> still a big mess
poto (potatus vs potus) 2
pradium (prandii vs prandi(i) 2
prodito (prodo vs prodeo) 2
promo (prompsi vs promsi) 1
pungo (pupugi vs pepgui) 1
quasi 2
salit (salo vs saleo) 2
scio (scivi vs scivi(ii) 3
scrutari (scruto vs scrutor) 3
septimia (septim vs septem) 3
sicut (adv vs conjunction) 2
spondeo (spopondi vs spepondi) 1
tueor (tuitus vs tutus) 1
vello (volsi vs velli) 1
I am going to add one more reperio vs repperio 2

@monzug monzug assigned balmas and unassigned monzug Sep 17, 2018
@balmas
Copy link
Member Author

balmas commented Sep 17, 2018

Thank you! Have split the 2s off into a new issue at alpheios-project/morphsvc#12

@balmas balmas closed this as completed Sep 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants