Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage Mechanism and Coverage Loss #180

Open
wanghm92 opened this issue Jul 25, 2018 · 8 comments
Open

Coverage Mechanism and Coverage Loss #180

wanghm92 opened this issue Jul 25, 2018 · 8 comments

Comments

@wanghm92
Copy link

May I ask if there is any plan adding the coverage attention mechanism (https://arxiv.org/pdf/1601.04811.pdf) and coverage loss (https://arxiv.org/pdf/1704.04368.pdf) to the decoder, as these could potentially help alleviating the repetition problem in generation?

Or, any hints on a quick implementation? Thanks!

@guillaumekln
Copy link
Contributor

There are no plans to add these features but contributions are welcome.

It is presently a bit complicated to customize the RNN decoder as we use the high-level tf.contrib.seq2seq APIs. We might want to revise that at some point.

@kaihuchen
Copy link

@wanghm92 In case you are not aware, OpenNMT-py does support a training option called "coverage_attn" which I have used to solve a problem somewhat similar to yours.

My use case is for learning a strictly token-by-token mapping from the source sequence to the target sequence, which does not allow for any unwanted repetition or additional/missing tokens during the translation. This is hard to enforce under OpenNMT-tf,but so far OpenNMT-py seems to work well for my purposes.

@wanghm92
Copy link
Author

@guillaumekln @kaihuchen Thanks a lot for the replies!
I came across the discussion on the "coverage_attn" option from OpenNMT-py but also found this line in global attention.py :
https://github.com/OpenNMT/OpenNMT-py/blob/fd1ec04758855008dbbf7ce1d56d16570544e616/onmt/modules/global_attention.py#L135-L142
Does that mean the coverage attention is still not supported yet? Or, @kaihuchen according to your experience the option indeed works?
The same question was asked on the forum but has no response yet.
http://forum.opennmt.net/t/whats-the-use-of-coverage-in-the-forward-pass-for-globalattention/1651
Could you give some hints?
Thanks!

@kaihuchen
Copy link

@wanghm92
FYI, I have been trying out the coverage_attn feature in OpenNMT-py since just yesterday. What I have observed from my experiments so far are as follows:

  • If I add the '-coverage_attr' option for training, then in the inferred results the constraint len(TARGET_SEQ)>=len(SRC_SEQ) seems always hold true, and the token-for-token mapping seems much better behaved. This is not the case when I was using OpenNMT-tf. I have not traced into the source code so I cannot confirm whether this implies that the coverage_attn is fully functional as the designer intended.
  • In the above case I still see the repetition problem occasionally in the generated sequence (but still within the len constraint mentioned above). It is possible that this was because my model was still under-trained at the time when I sampled it.
  • There are some additional translate.py options, such as stepwise_penalty, coverage_penalty, length_penalty, etc. that seem relevant, but I have not played with them to know whether they are useful in this case or not.

@wanghm92
Copy link
Author

@kaihuchen I see. I'm not sure if the developer forgot to delete the 'not supported' note or it is still under development. Would appreciate a clarification from the developers @guillaumekln if possible.
Thank you very much for your detailed explanations! I'll go and try out those options myself and share with you my observations later.

@guillaumekln
Copy link
Contributor

For any query about OpenNMT-py, please open issues to the dedicated repository. Thanks.

@tmkhalil
Copy link

tmkhalil commented Jul 1, 2021

@guillaumekln

I see this discussion happened three years ago. Are there any plans to work on these features at the moment?
Thank you!

@guillaumekln
Copy link
Contributor

There is no plan to work on this at the moment, but I would accept a PR adding these features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants