Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fintuning DPLM results in worse generation #11

Open
pengzhangzhi opened this issue Oct 8, 2024 · 2 comments
Open

fintuning DPLM results in worse generation #11

pengzhangzhi opened this issue Oct 8, 2024 · 2 comments

Comments

@pengzhangzhi
Copy link

Hi,

I simply loaded the pretrain weight and fine-tuned it on the same dataset, and got the ckpt that generates more repetitive sequences then I thought. This is quite bizarre to me. Is there something wrong with the current training code or the released ckpts are too good?

cc @zhengzx-nlp @wxy-nlp @leiyu-bytedance @lark

@wxy-nlp
Copy link
Collaborator

wxy-nlp commented Oct 10, 2024

hello @pengzhangzhi ,

could you provide the generation results and the way you load the checkpoint?

By the way, if you use the config yaml in config/experiment/lm and continue train from the pretrained weight, the learning rate is large, which may result in the large change of pretrained weight and lead to a bad performance. So if you want to continue train, the learning rate starts from the ending rate, i.e., 1e-5, may be better.

@pengzhangzhi
Copy link
Author

pengzhangzhi commented Oct 10, 2024

Hi @wxy-nlp ,

Thanks!!
I load the ckpt from the path:

c=dplm/byprot-checkpoints/dplm_150m_finetune_lr_1e-8/checkpoints/last.ckpt
python generate.py --model_name "airkingbd/${model_name}"         --seq_lens 100 --saveto ${output_dir} --num_seqs 100

I tried to set a smaller LR even 1e-8, but the fine-tuning would gradually degrade the pLDDT. Below is the comparison between the base DPLM-150M and the fine-tuned DPLM-150M with LR 1e-8:

pLDDT:
Base 
 69.44743
finetune
 66.5991

If I use LR 1e-5 or something larger than 1e-8, the generation is completely broken... : (
If you want to verify, you can simply set the LR to be 1e-5, load the ckpt, and fine-tune the mode for a couple thousand steps.

Also, could you please share the configs for DPLM-150M with us? I remember in the paper, you employ a two-stage training, I wonder the hyper-params for the two stages and the training steps. Would love to reproduce your training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants