Freeze Weights #22

spark157 · 2020-06-12T13:40:05Z

Hello,

I can see from the Training Details in the paper that during supervised fine-tuning backpropagation was through the entire model including the language model portion. I also see from the code that you had some functionality for freezing weights. I was curious what magnitude you saw between freezing or training the language model portion during the supervised fine-tuning if you did that, especially for the Transformer.

Thanks again!

Scott

rmrao · 2020-06-12T15:39:03Z

We did not test this thoroughly for every downstream task, but for secondary structure we generally saw 1-2 percentage points of improvement when fine-tuning the whole model. I suspect the difference will depend a great deal on the task.

…

On Jun 12, 2020, at 6:40 AM, spark157 ***@***.***> wrote: Hello, I can see from the Training Details in the paper that during supervised fine-tuning backpropagation was through the entire model including the language model portion. I also see from the code that you had some functionality for freezing weights. I was curious what magnitude you saw between freezing or training the language model portion during the supervised fine-tuning if you did that, especially for the Transformer. Thanks again! Scott — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#22>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABRSCXKLIJD5DRSX7OSJ7ADRWIV4JANCNFSM4N4K2YDQ>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Freeze Weights #22

Freeze Weights #22

spark157 commented Jun 12, 2020

rmrao commented Jun 12, 2020 via email

Freeze Weights #22

Freeze Weights #22

Comments

spark157 commented Jun 12, 2020

rmrao commented Jun 12, 2020 via email