Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vocab.txt如何生成?vocab_size为什么发生变化? #6

Open
wyqnumber opened this issue Jul 4, 2020 · 4 comments
Open

vocab.txt如何生成?vocab_size为什么发生变化? #6

wyqnumber opened this issue Jul 4, 2020 · 4 comments

Comments

@wyqnumber
Copy link

通过chinese_L-12_H-768_A-12模型训练生成simbert模型中的vocab.txt发生了变化,词的内容和数量都不同了,新simbert模型中的vocab.txt如何生成?

@wyqnumber
Copy link
Author

keep_tokens=keep_tokens, # 只保留keep_tokens中的字,精简原字表

@sssdjj
Copy link

sssdjj commented Aug 13, 2020

怎样保存精简词表呢

@sssdjj
Copy link

sssdjj commented Aug 13, 2020

path = open("test/vocab.txt","w+")

for i in token_dict.keys():
path.write(i+"\n")
path.close()

@lonngxiang
Copy link

也感觉遇到类似问题,预训练后加载模型预测报错,不知道什么原因产生
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants