Moving English Dependency Parsing to Universal Dependencies #7307
Replies: 2 comments
-
Hi, we've had this on our list of things that would nice to do for a long time, but it takes a lot of knowledge and effort to do these conversions well and we're concerned about having really high quality annotation, especially for English since it's used by so many users. There are several languages where we'd benefit from new/improved UD conversions:
For English: The DepEdit example might be a useful starting point, and it would be much easier than starting from constituency trees. Some concerns are that there are some differences between standard Stanford dependencies vs. the ClearNLP Stanford-ish dependencies that we currently have and that a conversion of a conversion also starts to make me nervous about the quality. I remember that Sebastian Schuster was working on updating the CoreNLP PTB->UD converter from UD v1 to UD v2 at one point, but I'm not sure it was finished/released. Even then, that converter is for PTB trees and it would require some changes work well for OntoNotes trees. For German: We're also considering using TrUDucer to convert the German Tiger dependencies to UD, so it's another tool to keep in mind for this. We have an initial set of TrUDucer rules for Tiger developed by Camille Watter as part of a bachelors' thesis, but there are a few places where we need to improve the conversion or do some postprocessing before we can train new models (some cycles in the resulting parse trees is the main problem I remember). For Chinese: If anyone interested in Chinese UD conversions comes across this, we'd be particularly grateful for contributions that would help us improve the CoreNLP Chinese UD conversion for OntoNotes. The pretrained Chinese pipelines' parser performance is currently fairly poor because the conversion isn't as high quality as the English conversion (many In general we'd be happy to have user contributions, but data licensing here does make collaboration a little tricky for OntoNotes. And if I have the time, I can handle the English and German conversions, but I don't have the knowledge to work on the Chinese conversion. |
Beta Was this translation helpful? Give feedback.
-
My question would rather be – should Spacy transition to UD at all? I'm an NLP practicioner and, in my experience, UD is harder to work with than DG (dependency grammar) and it's less performant. The question is moved to a dedicated discussion: #13738 |
Beta Was this translation helpful? Give feedback.
-
This was initially raised as an issue here (#2485) by @moshest and @honnibal has mentioned that "Ideally we'd like to be moving to the Universal Dependencies" and asked for methods to convert OneNotes to UD. I think it is worth having this as a discussion here.
What I could come up with is https://github.com/amir-zeldes/DepEdit and in particular https://github.com/amir-zeldes/DepEdit/blob/master/examples/stan2uni.ini. So I unfortunately dont have access to OneNotes that I can try and see how the results would look like but wanted to mention this for anyone who may be able to try this. The paper discussing this is here: https://www.aclweb.org/anthology/W18-4918.pdf
Beta Was this translation helpful? Give feedback.
All reactions