Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review mappings in CILI from PWN30 to PWN31 #17

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

arademaker
Copy link
Member

@fcbond, as requested. This is my PR.

reported in #16, these are four cases that I found during my work in the glosstag corpus. My approach was to use the sense keys as a bridge to map synset ids from PWN 3.0 to synsets from PWN 3.1.

@arademaker arademaker changed the title missing mappings from CILI to PWN31 review mappings in CILI from PWN30 to PWN31 Oct 8, 2022
@arademaker
Copy link
Member Author

arademaker commented Oct 8, 2022

This last commit in this PR closes issue #16. In this second commit, I am fixing errors found in the mapping.

Copy link
Member

@jmccrae jmccrae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this. A few of these mappings I disagree with but otherwise it is a great contribution

@@ -198,6 +198,7 @@ i199 00039507-s
i200 00039705-a
i201 00040060-s
i202 00040189-s
i202 00040305-s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates two synsets in PWN31 with the same ID

Copy link
Member Author

@arademaker arademaker Nov 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the PWN30 synset was split into two:

% rg "i202\t" ili-map-pwn3*
ili-map-pwn31.tab
201:i202	00040189-s
202:i202	00040305-s

ili-map-pwn30.tab
202:i202	00040058-s

WN30 00040058-s {'supine%5:00:00:passive:01', 'unresisting%5:00:00:passive:01', 'resistless%5:00:00:passive:01'}
offering no resistance; "resistless hostages"; "No other colony showed such supine, selfish helplessness in allowing
her own border citizens to be mercilessly harried"- Theodore Roosevelt

WN31 00040189-s {'unresisting%5:00:00:passive:01', 'resistless%5:00:00:passive:01'}
offering no resistance; "resistless hostages"

WN31 00040305-s {'supine%5:00:00:passive:01'}
passive as a result of indolence or indifference; "No other colony showed such supine, selfish helplessness in allowing her own border citizens to be mercilessly harried"- Theodore Roosevelt

If we consider the definition only, we can say that WN30 00040058-s maps to WN31 00040189-s. But one of its senses and one of its examples are now in another synset. There are some other cases similar to that, so let us first discuss that case, ok? @fcbond @jmccrae

WN30 00040058-s has only one similarTo relation with 00039592-a. This relation was projected to WN31 00040305-s and WN31 00040189-s which are both similarTo WN31 00039705-a. Moreover, both WN31 synsets also have an antonym relation to 00038863-a. This means they could not be differentiated by their relations in WN31 so the split is suspicious, they are indistinguishable (by their relations) in both WN31 and WN30. Yep, the glosses and examples differ, but the relations are the real WordNet criteria to define and distinguish a synset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the ili need not be 1-1 with PWN31, right? I am assuming that one ili can map to more than one synset in the same wordnet. So if we consider that i202 is a concept that is both 00040189-s and 00040305-s according to PWN31 , is it fine, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem that 00040058-s and 00040189-s are the same and should both be mapped to i202. However 00040305-s is a novel sense and will need to be assigned a new ILI

Copy link

@ekaf ekaf Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gist at https://gist.githubusercontent.com/ekaf/8cd78cce7005abd923c7ed2af47238e2 pretty prints the wordnet splits dictionary from NLTK, with information about how many senses are carried over into each part of the split. With WN 3.1 it outputs this file:
out-wnsplits.txt, listing the 33 splits since WN 3.0. The first line is:

00040058-s -> 00040305-s (1 sensekey/s) + 00040189-s (2 sensekey/s)

This shows that 00040305-s contains one sense from the source synset, while 00040189-s contains two.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the previous ILI mappings contained no splits, this PR introduces the following 5:

i202 00040189-s,00040305-s
i40396 00951435-n,00951878-n
i63228 07059027-n,07059160-n
i72354 06836640-n,06836790-n
i90722 10230249-n,10230422-n

So it seems that until now, mappers have made an effort to select only one most adequate target for each source. I think there is a good reason for avoiding to create splits, because having two targets yields both a true and a false positive for each involved sense.

@@ -49994,6 +49997,7 @@ i50029 02716223-n
i50030 02716355-n
i50031 02716453-n
i50033 02716628-n
i50034 02716785-n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a genuine change in PWN3.1. A synset was split into two new synsets that should both have a new ILI ID

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I got the comment right, you are talking about

WN30 02713992-n {'roundel%1:06:01::', 'annulet%1:06:02::'}
(heraldry) a charge in the shape of a circle; "a hollow roundel"

WN31 02716785-n {'roundel%1:06:01::'}
(heraldry) a charge in the shape of a filled circle; "a hollow roundel"

WN31 02716929-n {'annulet%1:06:02::'}
(heraldry) a charge in the shape of a small ring

In this PR, I was not quite sure if I can suggest new ilis. We have many options here. We can not map, the concept in WN30 was split into specialized ones in PWN31 as you confirmed. But what is i50034 so? It points to 02713992-n, something we are rejecting the existence according to the changes proposed in WN31. Isn't it weird to have i50034 in the CILI at all? If we map i50034 to WN31 02716785-n, we are saying that WN30 02713992-n correspond to WN31 02716785-n... It seems to need to better think about the real meaning of the mappings or go back to a more elaborate schema for mapping semantic networks (e.g. the ones used in EuroWordnet or by SUMO from @apease)

Copy link
Member Author

@arademaker arademaker Nov 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the limited expressivity of the current mappings adopted by CILI, we can mark i50034 as deprecated and keep that points only to WN30 02713992-n. Create two new ILI for 02716785-n and 02716929-n. The unfortunate limitation is to lose the knowledge that both WN31 concepts are specializations of 02713992-n.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are quite inflexible about the mapping but can easily create more identifiers. The ILI does not capture semantic relations anyway, that is the responsibility of individual wordnets. We should introduce two new ILI identifiers for these more specific senses

@@ -72310,6 +72314,7 @@ i72351 06836139-n
i72352 06836320-n
i72353 06836441-n
i72354 06836640-n
i72354 06836790-n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a distinct and novel sense

@@ -90652,6 +90657,7 @@ i90719 10229489-n
i90720 10229738-n
i90721 10230113-n
i90722 10230249-n
i90722 10230422-n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a distinct and novel sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants