You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SimAlign is an amazingly simple and effective way of obtaining word alignments from multilingual Transformer encoders. OpenKiwi is built on top of multilingual Transformers. Hence OpenKiwi can produce alignments.
The training objective of OpenKiwi might even improve the alignments.
The alignments could be used in ingenious ways in the quality predictions. For example:
The predicted BAD target words can be aligned with source tokens to highlight which source word might have caused the mistranslation (similar to the definition of 'source tags' in the WMT QE shared task)
The alignments themselves can be used to detect accuracy errors: if an alignment is missing between a content-word in source and target this might indicate an omission or a mistranslation.
To be investigated.
How?
Two options:
Pip install
We add SimAlign to the dependencies, and import from it. Challenge: we use the encoders in slightly different ways:
Integrate the SimAlign code into OpenKiwi and adapt as needed. All the decoding algorithms are left unchanged, only the model setup and forward pass need to be changed. The only files that is needed is: https://github.com/cisnlp/simalign/blob/master/simalign/simalign.py
Important notes:
We need to verify that the licence allows this (GNU GENERAL PUBLIC LICENSE)
All development and changes to SimAlign need to be ported manually (instead of automatically through new version releases)
We should properly reference SimAlign where we use their code - acknowledgements are important!
OpenKiwi code becomes more complicated
Open questions
What should the output format be? I think for passing alignments, List[Tuple[int, int]], and for saving to file we should opt for 'pharaoh format': i-j k-l etc.
How do we add alignments dynamically to the predicted output? Just another field in the output object?
The text was updated successfully, but these errors were encountered:
Why?
SimAlign is an amazingly simple and effective way of obtaining word alignments from multilingual Transformer encoders. OpenKiwi is built on top of multilingual Transformers. Hence OpenKiwi can produce alignments.
The training objective of OpenKiwi might even improve the alignments.
The alignments could be used in ingenious ways in the quality predictions. For example:
To be investigated.
How?
Two options:
Pip install
We add SimAlign to the dependencies, and import from it. Challenge: we use the encoders in slightly different ways:
Integrate code
Integrate the SimAlign code into OpenKiwi and adapt as needed. All the decoding algorithms are left unchanged, only the model setup and forward pass need to be changed. The only files that is needed is: https://github.com/cisnlp/simalign/blob/master/simalign/simalign.py
Important notes:
Open questions
List[Tuple[int, int]]
, and for saving to file we should opt for 'pharaoh format':i-j k-l
etc.The text was updated successfully, but these errors were encountered: