Replies: 5 comments 3 replies
-
I tried a few of the smaller encoder style models, but finetuning all layers every time.
All models:
iterations / second: calculated based on training iterations (batch size=12) on google colab with T4 GPU. (*) colab crashed on deBerta after one epoch ... i might need to stop being cheap and pay the 10 bucks if i want to keep playing around :) |
Beta Was this translation helpful? Give feedback.
-
@rasbt , would be curious to know which model gave you the 94.9% while only finetuning the last few layers. On a different note: when working with the huggingface ecosystems, what would be the pros and cons of leveraging huggingface's Trainer class, vs the lightning Trainer? Thanks again for all the good content! |
Beta Was this translation helpful? Give feedback.
-
updating my table here with
** ran on A100 instead of T4 but without mixed-precision (was not sure if it was compatible) comparison would have been more satisfying had i used siebert's version of roberta... but nice to see that LoRA seems to deliver some value even outside of super large language models! 😃 |
Beta Was this translation helpful? Give feedback.
-
Sharing results from some of the experiments that I did
@rasbt I couldn't understand why the full layers training of |
Beta Was this translation helpful? Give feedback.
-
Only training the last two layers: |
Beta Was this translation helpful? Give feedback.
-
My best model achieved 94.9% test accuracy (I don't want to spoil which one yet) 😊
Beta Was this translation helpful? Give feedback.
All reactions