TODO:
- Creating repo 'initial commit'
- Add datasets
- Writing mentoring plan
- List of models to study
- Classification Models
- Logistic Regression
- Desicion Tree
- Multinomian NB
- SVM
- SGDClassifier
The data is random content from Wikipedia in ~10 different languages. The idea is to build a classifiers that can identify the language of a sentence. Utilizing different kind of models and techniques, we are going to compare performance and try to identify most useful preposecing and hyper parameters.