Unsupervised Sentiment Analysis of IMDB Reviews Using VADER

Overview

This project performs unsupervised sentiment analysis on movie reviews scraped from the IMDB website. It leverages the VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis tool, which is well-suited for understanding sentiment in text data, especially for social media-like contexts with mixed reviews. The primary goal is to classify and analyze the sentiment of user-generated reviews (positive, neutral, or negative) without requiring labeled datasets.

Key Features

Web Scraping: The data is collected using a Python-based web scraper tailored for IMDB movie reviews.
Sentiment Analysis: VADER, a lexicon and rule-based model, is used for unsupervised sentiment scoring.
Insights: Outputs include sentiment distributions, individual review scores, and overall movie sentiment trends.

Key Steps

Web Scraping IMDB Reviews:

Implement a scraper to collect user reviews from specified IMDB movie pages.
The scraper extracts review text, ratings (if available), and metadata.
Data is saved in a structured format (e.g., CSV).

Data Cleaning:

Remove irrelevant or non-textual data (e.g., HTML tags, emojis).
Normalize text by converting to lowercase and handling punctuation.

Sentiment Scoring:

Apply the VADER tool to compute compound, positive, neutral, and negative sentiment scores for each review.
Categorize sentiment based on thresholds:
- Compound score > 0.05: Positive
- Compound score < -0.05: Negative
- Otherwise: Neutral

Analysis and Visualization:

Calculate sentiment distributions across reviews.
Generate plots to visualize sentiment trends.

Project Limitations

Model Dependency: VADER is optimized for short, social media-style text. Accuracy may decrease for long-form movie reviews with complex narratives.
Unsupervised Nature: As this is unsupervised, the model relies on predefined lexicons and cannot account for domain-specific context or unseen words.
Data Quality: The accuracy of sentiment analysis depends heavily on the quality and diversity of the scraped reviews. Noisy or unbalanced datasets can skew results.
Language Restriction: VADER supports only English reviews, limiting the scope for multi-lingual sentiment analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
movie_reviews_scraper.ipynb		movie_reviews_scraper.ipynb
sentiment_analysis.ipynb		sentiment_analysis.ipynb
what if-reviews.xlsx		what if-reviews.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Sentiment Analysis of IMDB Reviews Using VADER

Overview

Key Features

Key Steps

Project Limitations

About

Releases

Packages

Languages

JayaWinata/Unsupervised-Sentiment-Analysis-with-VADER

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Sentiment Analysis of IMDB Reviews Using VADER

Overview

Key Features

Key Steps

Project Limitations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages