Skip to content

This project aims to build a predictable model to predict the movie revenue by implementing statistic analysis, feature engineering, machine learning techniques, text analytics.

Notifications You must be signed in to change notification settings

Quinnduong/IMDB-MOVIE-DATA-ANALYSIS-PROJECT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This project focuses on analyzing the IMDB Movies dataset to predict movie revenue using machine learning techniques. By exploring and modeling relationships between different movie features (such as budget, genre, score, and country), the project aims to identify factors that contribute to a movie's financial success.

Results Overview

Best-performing Model: Decision Forest Regression

R-squared value (Best Model): 0.6659

Key Predictors: Budget, Score, Country, Genre, Original Language

Screenshot 2024-09-16 at 12 53 49 AM Screenshot 2024-09-16 at 12 54 47 AM

Project Objectives

The main objective of this project is to predict movie revenue using independent variables from the IMDB dataset. The process involved:

Data Preprocessing: Cleaning the dataset by handling missing values, normalizing variables, and ensuring data consistency.

Exploratory Data Analysis (EDA): Understanding the relationships between movie features (e.g., budget, score, genre) and revenue through visualizations and statistical testing.

Budget

Ho: We expect that Budget will have no effect on Revenue.

Screenshot 2024-09-16 at 12 58 57 AM

Summary: There is a strong effect of Budget on Revenue that is very reliable. We tested all non-linear transformations and found that Polynomial produced the best R2 value of 0.48948 and a p-value of < 0.0001 (reject Null Hypothesis).

Genre

Ho: We expect that there will be no effect by Genre on Revenue

Screenshot 2024-09-16 at 12 57 00 AM

Summary: There is an effect by Genre on Revenue with a f-stat value of 56.2825183 and a p-value of 0.09760088 (<0.5) -> reject Null Hypothesis.

See more details in Final Project.pdf

Feature Engineering: Creating new relevant features and transforming existing ones to improve prediction accuracy.

Screenshot 2024-09-16 at 12 47 45 AM

Model Training and Testing: Applying machine learning algorithms such as Linear Regression, Bayesian Linear Regression, and Decision Forest Regression to predict movie revenue.

Screenshot 2024-09-16 at 12 48 31 AM Screenshot 2024-09-16 at 12 49 28 AM

Model Evaluation and Fine-tune: Comparing model performance using R-squared and RMSE values to select the best model for predicting revenue.

Visualization: Tableau Dashboard

A comprehensive dashboard was created to visualize key findings from the analysis, including:

Screenshot 2024-09-16 at 12 46 13 AM

Key Findings

Budget and Revenue: Higher production budgets are strongly associated with higher revenues. Movie Score: Movies with higher IMDB scores generally perform better in terms of revenue. Country of Origin: Certain countries (e.g., Mauritius, Taiwan) have higher average movie revenues. Genre Impact: Genres like Adventure, Action, and Fantasy generate higher revenue compared to others. Status Effect: Movies in the "Released" stage earn significantly more than those in production or post-production.

About

This project aims to build a predictable model to predict the movie revenue by implementing statistic analysis, feature engineering, machine learning techniques, text analytics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published