Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Slovenian language #298

Closed
2 of 4 tasks
ajdapretnar opened this issue Jul 31, 2017 · 4 comments
Closed
2 of 4 tasks

Add support for Slovenian language #298

ajdapretnar opened this issue Jul 31, 2017 · 4 comments

Comments

@ajdapretnar
Copy link
Collaborator

ajdapretnar commented Jul 31, 2017

Proposed enhancements

This issue is to keep track of Slovenian language technologies to support.

  • lemmatization, potentially IJS's LemmaGen
  • POS tagging, potentially Obeliks
  • stopword list (can be provided personally)
  • corpora Gigafida, Kres

Then add Orange to MK's list of tools for Slovene language.

@ajdapretnar
Copy link
Collaborator Author

Other suggested tools:

  • UDPipe: figure out the documentation and how to use it in Python, I could not
  • ReLDI: potentially problematic if one can use it only with API keys (I had issues accessing their service)

@ajdapretnar
Copy link
Collaborator Author

For stopwords, see nltk/nltk_data#54 (comment).

@ajdapretnar
Copy link
Collaborator Author

For sentiment analysis, see CLARIN resources.

@ajdapretnar
Copy link
Collaborator Author

We have added support for Slovenian in Preprocess Text (UDPipe Lemmatizer), Sentiment Analysis, and recently in #504 for Document Embedding. For now, this concludes our quest for supporting Slovene. Other suggestions should be addressed in separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant