Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gensim's FastText model reads in unsupported modes from Facebook's FastText #3179

Open
mpenkov opened this issue Jun 22, 2021 · 11 comments
Open
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix fasttext Issues related to the FastText model good first issue Issue for new contributors (not required gensim understanding + very simple)

Comments

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 22, 2021

In gensim/models/fasttext.py:

    model = FastText(
        vector_size=m.dim,
        vector_size=m.dim,
        window=m.ws,
        window=m.ws,
        epochs=m.epoch,
        epochs=m.epoch,
        negative=m.neg,
        negative=m.neg,
        # FIXME: these next 2 lines read in unsupported FB FT modes (loss=3 softmax or loss=4 onevsall,
        # or model=3 supervised) possibly creating inconsistent gensim model likely to fail later. Displaying
        # clear error/warning with explanatory message would be far better - even if there might be some reason
        # to continue with the load - such as providing read-only access to word-vectors trained those ways. (See:
        # https://github.com/facebookresearch/fastText/blob/2cc7f54ac034ae320a9af784b8145c50cc68965c/src/args.h#L19
        # for FB FT mode definitions.)
        hs=int(m.loss == 1),
        hs=int(m.loss == 1),
        sg=int(m.model == 2),
        sg=int(m.model == 2),
        bucket=m.bucket,
        bucket=m.bucket,
        min_count=m.min_count,
        min_count=m.min_count,
        sample=m.t,
        sample=m.t,
        min_n=m.minn,
        min_n=m.minn,
        max_n=m.maxn,
        max_n=m.maxn,
    )
@mpenkov mpenkov added bug Issue described a bug difficulty easy Easy issue: required small fix fasttext Issues related to the FastText model good first issue Issue for new contributors (not required gensim understanding + very simple) labels Jun 22, 2021
@amin110314
Copy link

may I work on this?

@mpenkov
Copy link
Collaborator Author

mpenkov commented Jun 22, 2021

Sure.

@kitrak-rev
Copy link

kitrak-rev commented Aug 23, 2021

Since there is no commits for a month, can I take up this task?
(I am just starting with open source contribution, any references or redirection is welcomed)

@Tewatia5355
Copy link

Tewatia5355 commented Aug 30, 2021

#3222 I have tried to add error handling before loading another fasttext model to gensim/fasttext.
Please review and tell me will it work?
image

@karan121bhukar
Copy link

@mpenkov this is my 1st contribution to this project, can you take a look at PR #3223 and let me know if it's good to go?

@ryandelano
Copy link

Hi, is this issue still available to be worked on? Would love to get started with contributions on this project!

@NexusSRC
Copy link

would love to start contributing to this repo by starting with this issue, can it be assigned to me?

@gojomo
Copy link
Collaborator

gojomo commented Jul 25, 2024

You don't need to be assigned something to propose a fix, in the form of a Github PR.

This is a very low-stakes matter - less-used (fairly obscure) FastText modes, with few people expecting them to work in Gensim. It's unclear that the gap here has ever actually inconvenienced or confused anyone – it's just a matter of being clear/thorough in describing what actually works, "just in case".

So improvements should stay safe/minimal/easy-to-review/low-risk - for example, it might just be some extra clarity in related doc-comments (to note not all modes fully supported). Maybe, also, some warning sent in user-visible output (with same conventions as similar warnings elsewhere in Gensim) if/when unsuppored modes are loaded - if it can be done simply.

@NexusSRC
Copy link

Okay, will look forward for more impactful problems to solve then

@gojomo
Copy link
Collaborator

gojomo commented Jul 29, 2024

This would still be a fine first issue, for which improvements wquld be welcome! My comment was meant to help describe the kind of (careful, minimal, straightforward) contribution that would make sense for this sort of "loose ends" issue.

@NexusSRC
Copy link

Sure makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix fasttext Issues related to the FastText model good first issue Issue for new contributors (not required gensim understanding + very simple)
Projects
None yet
8 participants