You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can see from the Training Details in the paper that during supervised fine-tuning backpropagation was through the entire model including the language model portion. I also see from the code that you had some functionality for freezing weights. I was curious what magnitude you saw between freezing or training the language model portion during the supervised fine-tuning if you did that, especially for the Transformer.
Thanks again!
Scott
The text was updated successfully, but these errors were encountered:
We did not test this thoroughly for every downstream task, but for secondary structure we generally saw 1-2 percentage points of improvement when fine-tuning the whole model. I suspect the difference will depend a great deal on the task.
On Jun 12, 2020, at 6:40 AM, spark157 ***@***.***> wrote:
Hello,
I can see from the Training Details in the paper that during supervised fine-tuning backpropagation was through the entire model including the language model portion. I also see from the code that you had some functionality for freezing weights. I was curious what magnitude you saw between freezing or training the language model portion during the supervised fine-tuning if you did that, especially for the Transformer.
Thanks again!
Scott
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#22>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABRSCXKLIJD5DRSX7OSJ7ADRWIV4JANCNFSM4N4K2YDQ>.
Hello,
I can see from the Training Details in the paper that during supervised fine-tuning backpropagation was through the entire model including the language model portion. I also see from the code that you had some functionality for freezing weights. I was curious what magnitude you saw between freezing or training the language model portion during the supervised fine-tuning if you did that, especially for the Transformer.
Thanks again!
Scott
The text was updated successfully, but these errors were encountered: