google.bin 相关疑问 #16

burette · 2019-12-02T09:53:55Z

前辈您好，
您在代码里注释的
#tf.flags.DEFINE_string("word2vec", "./data/rt-polaritydata/google.bin", "Word2vec file with pre-trained embeddings (default: None)")
这个google.bin文件就是谷歌的GoogleNews-vectors-negative300.bin文件是么？

Aliang-CN · 2019-12-06T01:53:08Z

大佬，negative300.bin这个文件试过吗

burette · 2019-12-06T03:24:00Z

大佬，negative300.bin这个文件试过吗

这个文件试过了。用的就是GoogleNews-vectors-negative300.bin这个预训练的。原代码使用Python2.7，我使用的python3.5，按照原来代码读这个文件的地方，会出现错误，内存溢出。python3下使用下面的片段进行读取negative300.bin：
for line in tqdm(range(vocab_size)):
# word = []
# while True:
# ch = f.read(1)
# if ch == b' ':
# # word = ''.join(word)
# break
# if ch != b'\n':
# word.append(ch)
word = b''
while True:
ch = f.read(1)
if ch == b' ':
break
word += ch
这个可以走通整个流程。

Aliang-CN · 2019-12-09T01:47:19Z

大佬有试过gensim读取bin文件吗

Aliang-CN · 2019-12-09T02:16:23Z

你这种方法读取太慢了，要3个小时

burette · 2019-12-09T02:31:18Z

你这种方法读取太慢了，要3个小时

读取三个小时可能是机器性能问题？我这边几台机子都是几分钟读完i5的机子

burette · 2019-12-09T02:38:29Z

大佬有试过gensim读取bin文件吗

from gensim.models.keyedvectors import KeyedVectors
model = KeyedVectors.load_word2vec_format(
'GoogleNews-vectors-negative300.bin', binary=True, limit=300000)

Aliang-CN · 2019-12-09T03:53:20Z

我两种方法都试过了，我遍历vocabulary_user的词，发现在model里面都没有这个词，你那边是什么情况呢？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

google.bin 相关疑问 #16

google.bin 相关疑问 #16

burette commented Dec 2, 2019

Aliang-CN commented Dec 6, 2019

burette commented Dec 6, 2019

Aliang-CN commented Dec 9, 2019

Aliang-CN commented Dec 9, 2019

burette commented Dec 9, 2019

burette commented Dec 9, 2019

Aliang-CN commented Dec 9, 2019

google.bin 相关疑问 #16

google.bin 相关疑问 #16

Comments

burette commented Dec 2, 2019

Aliang-CN commented Dec 6, 2019

burette commented Dec 6, 2019

Aliang-CN commented Dec 9, 2019

Aliang-CN commented Dec 9, 2019

burette commented Dec 9, 2019

burette commented Dec 9, 2019

Aliang-CN commented Dec 9, 2019