-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about the mappings PWN30 and PWN31 #16
Comments
So PWN3.0 03021531-n should be mapped to PWN 3.1 03025214-n, right? But i51874 only maps to the PWN3.0 |
In another case, the gloss changed, but it seems to be the same concept:
|
|
The last one
I can make a PR to change the |
I can't really explain this, as the mapping was completed by PWN senses, so these should be linked. I see 274 'new' senses in PWN 3.1 according to OEWN and some of these are genuinely new (e.g. 'Barack Obama') others don't seem to be. If you are capable of identifying these automatically it would be a great help |
FWIW, the first two are listed as deprecated in changes-in-wn31.csv: $ grep -P 'i51874|i59022|i13521|i10035' changes-in-wn31.csv
deprecated,ili:i51874,03021531-n,none,chlorambucil/Leukeran
deprecated,ili:i59022,04231905-n,none,Skivvies The other two words are in the file under a different ILI: grep -P 'inferior|regent' changes-in-wn31.csv
deprecated,ili:i13656,01827261-s,none,regent
deprecated,ili:i17142,02440996-s,none,inferior
new,,,01832979-a,regent(ip)
new,,,02450200-a,inferior @fcbond do you know what is the story here? |
The mapping says that two concepts from PWN30 were merged in PWN31. But
00929443-s should map to 00932684-s |
The concept in PWN30 was split into two in PWN31, right? But the mapping is not reflecting that:
I would say that 10210648-n maps to both 10230249-n and 10230422-n in PWN31. Same for
and also
|
Is this the same case as above?
We can say the sense was split, so the PWN30 synset needs to map both synsets in PW31. = 06823760-n 06836640-n Or we can say that none of the new synsets are replacements for the old PWN30 synset; they are generalizations. So PWN30 is <= both PWN31. That would force us to extend the mapping to deal with more fine-grained relations rather than only equality. BTW, can someone see the reason for splitting this sense from PWN30? |
Here the mapping seems right, but the PWN31 structure can be changed:
If someone is a nutritionist, he/she is also a dietician, right? Because |
The mapping is wrong. it points 00042692-s to 00035037-s but 00042912-s is more appropriate, right?
|
There is no 02319740-a in PWN31, but 02319740-s, it is a satellite synset.
|
Once more, it seems that the mapping is wrong. 00675928-s from PWN30 is 00679196-s in PWN31 not 00679361-s.
|
The mapping is missing 02713992-n to 02716929-n, or maybe we can map to both 02716785-n and 02716929-n. but considering the example and definition, mapping to 02716785-n looks better. |
The mapping is clearly wrong. The 02599754-v from PWN30 should maps to 02605751-v in PWN31
|
The mapping is clearly wrong. Both Brioschi and Tums are antacid, and both exists in PWN30 and PWN31. The mapping is pointing 14777104-n to 14802098-n but it should point to 14801263-n.
Having both trademarks in WN is strange anyway... but we do not remove them, right? Just do not add more of those in the English Wordnet. The same error 14777188-n should map to 14801347-n not to 14802098-n
Same error 14777441-n should map to 14801600-n not 14802098-n
|
The mapping is wrong 00767349-s should map to 00770909-s not to 00766556-s
same for 00802179-s, it should map to 00805750-s not 00805871-s
and 00780944-s that should map to 00784503-s not 00805750-s
or we can join the senses from PWN30?
|
In #16 (comment), @goodmami mentioned the changes in PWN31 proposed by @fcbond. The question here seems to be. The mapping of PWN30 to PWN31 was created on top of the Princeton releases, right? Later, we may have changes (or patches) for both PWN30 and PWN31. |
@arademaker, your specific examples, and more, correspond to what I find in the attached output file (en-loss.txt), produced using the Wn library from @goodmami, and the sensekey-based mapping algorithm included in the recent NLTK versions. The attached file shows the details for the English row of the losses table in my soon forthcoming conference paper, which you saw me present last week at GWC 2023: English 117659 117454 205 0.17 117659 117427 232 0.2 The first 4 numbers in that row concern synsets mapped vs. lost with an offset mapping, and the 4 last numbers concern the ILI mapping. The respective losses (205 with offsets vs. 232 with ILI) can be decomposed like this: English, 143 lost with both offsets and ILI It is great that you have already started to improve the mappings. Congratulations with that: if your proposed targets can be verified, they will bring us closer to the perfect mappings, which seemed out of reach not long ago! @jmccrae, the most probable reason why a sense-based mapping misses the above cases would be that their sensekey suffered a small alteration, like f. ex. a change of lexfile or lex_id. Splits are a different problem. Mapping whole synsets cannot handle splits correctly, since splits concern different parts of a synset. Mapping a split synset to two target synsets produces false positives, because every involved sense would get mapped to one correct and one wrong target. Mapping a split to only a single target is still wrong, but yields fewer false positives. The only adequate treatment for splits is a mapping that handles only the concerned senses. |
Rebuilding the |
It seems that some of the new WN3.1 synsets aren't really new but were just errors in the original mapping. I will revise this mapping and see if we can reduce the number of actually new synsets |
I updated the mapping. There were a few synsets that were not properly aligned, although there are some challenges as PWN 3.1 also split some synsets (e.g., "decadent, effete" => "decadent, fin-de-siecle" and two synsets for "effete"). |
where is the construction of the mappings from CILI to PWN30 and PWN31 documented?
The text was updated successfully, but these errors were encountered: