A NLP project to classify text on the basis of it's language with an accuracy of 100% (on test set and new examples) and confidence higher than 99%.
The aim of this notebook is to create a (Tensorflow) model that can classify the language (English, Spanish, etc.) of a given sentence.
In this notebook, English, Italian, Spanish and French languages are classified.
I have used a sequential model with following layers in order:
- Embedding layer with vocab size of 49314 units and embedding dimension of 128 units.
- Flatten layer.
- Dense Layer with 256 neurons (activation units).
- Dropout layer with a dropout factor of 0.2.
- Dense Layer with 256 neurons (activation units).
- Dropout layer with a dropout factor of 0.2.
- Dense Layer with 4 neurons (activation units). This is the output layer.
The optimizer used is Adam with default settings. Batch size was taken as 128.
The predictions of the model are as follows:
The sentence ' ['TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks.'] ' is in ' English ' language with 99.84663724899292 % confidence.
The sentence ' ['TensorFlow è una libreria software gratuita e open source per il flusso di dati e la programmazione differenziabili in una vasta gamma di attività.'] ' is in ' Italian ' language with 99.95958209037781 % confidence.
The sentence ' ['TensorFlow es una biblioteca de software gratuita y de código abierto para el flujo de datos y la programación diferenciable en una variedad de tareas.'] ' is in ' Spanish ' language with 99.99784231185913 % confidence.
The sentence ' ['TensorFlow est une bibliothèque logicielle gratuite et open-source pour le flux de données et la programmation différenciable sur une gamme de tâches.'] ' is in ' French ' language with 99.9789297580719 % confidence.