PhD_word_embeddings

This repository holds two simple class wrapper implementing some word embedding models.

MongoWordEmbedding implements a MongoDB based wrapper that consume embeddings from a mongoDB database, this is memory efficient but require a MongoDB instance.
WordEmbedding implements an in memory wrapper that loads the models into the memory, this is memory inefficient but do not require a MongoDB instance.

Current used pretrained word embedding models

Word2Vec: TODO insert link
FastText (en): TODO insert link
FastText (Aligned): https://fasttext.cc/docs/en/aligned-vectors.html a. Spanish: https://dl.fbaipublicfiles.com/fasttext/vectors-aligned/wiki.es.align.vec b. English: https://dl.fbaipublicfiles.com/fasttext/vectors-aligned/wiki.en.align.vec c. Portugese: https://dl.fbaipublicfiles.com/fasttext/vectors-aligned/wiki.pt.align.vec
LSA:

Requirements

The MongoWordEmbedding uses MongoDB, in order to use that version a running conextion to a MongoDB with read/write permisions is required

Install in Linux

Install MongoDB if you intend to use MongoWordEmbedding.
Clone this repo.
Enter to the repo root dir from a console.
python setup.py install.
Configure settings.json.

Settings.json

The settings.json file stores the following configuration:

{
    'embeddings_folder': path to the folder where the embedding model files are stored with the structire shown in (1)
    'mongo_client': a dictionary with the parameters of the Pymongo.MongoClient(**parameters) database conexion method
}

(1) The word embedding folder and file structure is the following:

word_mebddings_folder
|
├── word2vec
|   └── GoogleNews-vectors-negative300.bin
|
├── fasttext
|   └── cc.en.300.bin
|
├── glove
|   └── glove.840B.300d.txt
|
└── LSA
    └── tasa_300
        └── matrix.npy

Examples using the WordEmbedding module (without Mongo)

from MultiModelWordEmbedding.WordEmbedding import WordEmbedding

w2v = WordEmbedding('Word2Vec')
w2v['cat']

Examples using the MongoWordEmbedding module (with Mongo)

from MultiModelWordEmbedding.MongoWordEmbedding import download_embedding_models, MongoWordEmbedding
download_embedding_models('words_embedding_folder')

w2v = MongoWordEmbedding('Word2Vec')
w2v['cat']

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PhD_word_embeddings

Current used pretrained word embedding models

Requirements

Install in Linux

Settings.json

Examples using the WordEmbedding module (without Mongo)

Examples using the MongoWordEmbedding module (with Mongo)

Files

README.md

Latest commit

History

README.md

File metadata and controls

PhD_word_embeddings

Current used pretrained word embedding models

Requirements

Install in Linux

Settings.json

Examples using the WordEmbedding module (without Mongo)

Examples using the MongoWordEmbedding module (with Mongo)