Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在lcqmc数据集上微调效果下降 #19

Open
elihuan1990 opened this issue Aug 24, 2021 · 3 comments
Open

在lcqmc数据集上微调效果下降 #19

elihuan1990 opened this issue Aug 24, 2021 · 3 comments

Comments

@elihuan1990
Copy link

在lcqmc数据集上微调simbert,在测试集上spearman指标下降一个点,怎么微调simbert呢?

@bojone
Copy link

bojone commented Dec 29, 2021

可以用sentence-bert的方式微调

@WenTingTseng
Copy link

WenTingTseng commented Aug 26, 2023

請問simbert.py訓練完模型並儲存best_model.weights了
我要如何加載best_model.weights模型並測試
`from bert4keras.tokenizers import Tokenizer
from bert4keras.models import build_transformer_model
from keras.models import Model
import numpy as np

config_path = '/home/rca/research/simbert/root/kg/bert/chinese_simbert_L-12_H-768_A-12/bert_config.json'
checkpoint_path = './latest_model.ckpt'
dict_path = '/home/rca/research/simbert/root/kg/bert/chinese_simbert_L-12_H-768_A-12/vocab.txt'

tokenizer = Tokenizer(dict_path, do_lower_case=True)

bert = build_transformer_model(
config_path,
checkpoint_path,
with_pool='linear',
application='unilm',
return_keras_model=False,
)
model = Model(inputs=bert.model.inputs, outputs=bert.model.outputs)
model.load_weights(checkpoint_path, by_name=True) # 加载权重时需要加上 by_name=True

test_sentence = "微信和支付宝哪个好?"

def gen_similar_sentences(text, n=10, k=10):
similar_sentences = gen_synonyms(text, n, k) # 需要定义 gen_synonyms 函数
return similar_sentences

token_ids, segment_ids = tokenizer.encode(test_sentence, max_length=maxlen)

output_ids = model.predict([np.array([token_ids]), np.array([segment_ids])])
output_ids = output_ids[0].argmax(axis=1)

generated_sentence = tokenizer.decode(output_ids)

print(f"原句子:{test_sentence}")
print(f"生成句子:{generated_sentence}")
print("相似句子:")
similar_sentences = gen_similar_sentences(test_sentence)
for idx, sentence in enumerate(similar_sentences):
print(f"{idx + 1}. {sentence}")`
是這樣寫嗎

@HelenGuohx
Copy link

我的方法是直接 from simbert import gen_synonyms,这样模型会加载新的权重

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants