Coverage Mechanism and Coverage Loss #180

wanghm92 · 2018-07-25T00:33:14Z

May I ask if there is any plan adding the coverage attention mechanism (https://arxiv.org/pdf/1601.04811.pdf) and coverage loss (https://arxiv.org/pdf/1704.04368.pdf) to the decoder, as these could potentially help alleviating the repetition problem in generation?

Or, any hints on a quick implementation? Thanks!

guillaumekln · 2018-07-25T07:57:51Z

There are no plans to add these features but contributions are welcome.

It is presently a bit complicated to customize the RNN decoder as we use the high-level tf.contrib.seq2seq APIs. We might want to revise that at some point.

kaihuchen · 2018-07-25T15:18:39Z

@wanghm92 In case you are not aware, OpenNMT-py does support a training option called "coverage_attn" which I have used to solve a problem somewhat similar to yours.

My use case is for learning a strictly token-by-token mapping from the source sequence to the target sequence, which does not allow for any unwanted repetition or additional/missing tokens during the translation. This is hard to enforce under OpenNMT-tf，but so far OpenNMT-py seems to work well for my purposes.

wanghm92 · 2018-07-25T16:02:54Z

@guillaumekln @kaihuchen Thanks a lot for the replies!
I came across the discussion on the "coverage_attn" option from OpenNMT-py but also found this line in global attention.py :
https://github.com/OpenNMT/OpenNMT-py/blob/fd1ec04758855008dbbf7ce1d56d16570544e616/onmt/modules/global_attention.py#L135-L142
Does that mean the coverage attention is still not supported yet? Or, @kaihuchen according to your experience the option indeed works?
The same question was asked on the forum but has no response yet.
http://forum.opennmt.net/t/whats-the-use-of-coverage-in-the-forward-pass-for-globalattention/1651
Could you give some hints?
Thanks!

kaihuchen · 2018-07-26T01:55:49Z

@wanghm92
FYI, I have been trying out the coverage_attn feature in OpenNMT-py since just yesterday. What I have observed from my experiments so far are as follows:

If I add the '-coverage_attr' option for training, then in the inferred results the constraint len(TARGET_SEQ)>=len(SRC_SEQ) seems always hold true, and the token-for-token mapping seems much better behaved. This is not the case when I was using OpenNMT-tf. I have not traced into the source code so I cannot confirm whether this implies that the coverage_attn is fully functional as the designer intended.
In the above case I still see the repetition problem occasionally in the generated sequence (but still within the len constraint mentioned above). It is possible that this was because my model was still under-trained at the time when I sampled it.
There are some additional translate.py options, such as stepwise_penalty, coverage_penalty, length_penalty, etc. that seem relevant, but I have not played with them to know whether they are useful in this case or not.

wanghm92 · 2018-07-26T02:14:49Z

@kaihuchen I see. I'm not sure if the developer forgot to delete the 'not supported' note or it is still under development. Would appreciate a clarification from the developers @guillaumekln if possible.
Thank you very much for your detailed explanations! I'll go and try out those options myself and share with you my observations later.

guillaumekln · 2018-07-26T07:19:41Z

For any query about OpenNMT-py, please open issues to the dedicated repository. Thanks.

tmkhalil · 2021-07-01T10:25:34Z

@guillaumekln

I see this discussion happened three years ago. Are there any plans to work on these features at the moment?
Thank you!

guillaumekln · 2021-07-01T11:18:20Z

There is no plan to work on this at the moment, but I would accept a PR adding these features.

guillaumekln added the enhancement label Jul 25, 2018

guillaumekln added the help wanted label Jul 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coverage Mechanism and Coverage Loss #180

Coverage Mechanism and Coverage Loss #180

wanghm92 commented Jul 25, 2018

guillaumekln commented Jul 25, 2018

kaihuchen commented Jul 25, 2018

wanghm92 commented Jul 25, 2018

kaihuchen commented Jul 26, 2018

wanghm92 commented Jul 26, 2018

guillaumekln commented Jul 26, 2018

tmkhalil commented Jul 1, 2021

guillaumekln commented Jul 1, 2021

Coverage Mechanism and Coverage Loss #180

Coverage Mechanism and Coverage Loss #180

Comments

wanghm92 commented Jul 25, 2018

guillaumekln commented Jul 25, 2018

kaihuchen commented Jul 25, 2018

wanghm92 commented Jul 25, 2018

kaihuchen commented Jul 26, 2018

wanghm92 commented Jul 26, 2018

guillaumekln commented Jul 26, 2018

tmkhalil commented Jul 1, 2021

guillaumekln commented Jul 1, 2021