Skip to content

Latest commit

 

History

History
41 lines (25 loc) · 2.48 KB

README.md

File metadata and controls

41 lines (25 loc) · 2.48 KB

TwiSent : Tweet Based Public Sentiment Analysis of The New Education Policy (NEP) 2020

Introduction

This project uses sentiment analysis to classify public tweets about the New Education Policy (NEP) 2020 as positive, negative, or neutral. Unlike classical machine learning methods, our LSTM-based approach captures contextual meaning in longer text sequences, offering more accurate results. The model is tested on the NEP 2020 dataset availble on Kaggle

System Architecture

archi This figure illustrates the architecture of a sentiment analysis model for raw input tweet. The process begins with a Pre-processing module, consisting of standard pre-processing steps and tokenization, followed by polarity assignment. The processed tweet is converted into dense vector representation with specified dimension (Nx100). This vector represented tweet is then fed into an LSTM layer, which is designed to understand the sequence and context of the words. Finally, the output from the LSTM layer is passed through a softmax layer, which classifies the sentiment of the text into positive (POS), neutral (NEU), or negative (NEG).

Pre Processing of dataset

preprocess The preprocessing unit takes in raw input tweet and . Then we remove the stop words to arrive at only meaningful words which provide context to our sentences, we then index each word.

Distribution of dataset into repeating and non repeating tweets

distrbution The public tweets are repeated in the dataset because each retweet is also considered as a new independent tweet irrespective of the content owing to different tweet id posted.

Labelling of dataset

vaderbro Distribution of the dataset by labelling each tweet into positive, negative and neutral by using Valence Aware Dictionary and Sentiment Reasoner (VADER), similar distribution was observed for labelling using TextBlob.

Results for model trained on TextBlob

Res The confusion matrix depicts the prediction of the model using all tweets labelled by TextBlob tested on 60 custom input tweets.