Skip to content

gd1m3y/Product-Sentiment-Analaysis-Using-BERT

Repository files navigation

Sentiment Analaysis Using BERT

introduction

The Aim of this project is to classify the sentiment of a given product review using BERT.

Sentiment Analysis or Text Classification is the process of determining the sentiment behind the text based on the context.

image

This Project Aims to Demonstrate the Text Classification.

Work Flow

The work Flow of the Project -

img

  • Preprocessing - Using regular Expressions and many other libraries to remove irregularities in the data such as punctuations,links,numbers which doesnt have any specific effect on the model rather may result in abnormal results.
  • Sentence-Tokenization - Tokenization is the process of converting text into tokens so that it can be understood by the model.
  • Distill-BERT - we use our model to classify the text into either positive or negative sentences.

BERT

BERT stands for Bidirectional Encoder Representations from Transformers. It is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of NLP tasks

image

BERT is based on the Transformer architecture. image Tokenization in bert

Data Set

The data set used is boat amazon reviews dataset which is uploaded in the repo.

Model Used

The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT, and the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT’s performances as measured on the GLUE language understanding benchmark.

Results

positive word cloud

image

positive topics

image

negative word cloud

image

negative topics

image

Technology Stack

  • Spacy - A NLP Library used for variety of tasks Such as Named entity recognition
  • Transformer - A Deeplearning Library developed by hugging face .
  • Numpy - Basic Mathematical library
  • re - for performing string operations
  • pandas - Data manipulation library
  • matplotlib - visualization libray
  • Sklearn - A library consisting of many functions regarding Mathematics and Statistics.

To-do

  • Using different sophisticated models or methodologies to train on embeddings and achieve a better accuracy
  • Different Preprocessing steps for a better result

Contact

Want to contribute ? Contact me at [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published