This project focuses on analyzing the IMDB Movies dataset to predict movie revenue using machine learning techniques. By exploring and modeling relationships between different movie features (such as budget, genre, score, and country), the project aims to identify factors that contribute to a movie's financial success.
Results Overview
Best-performing Model: Decision Forest Regression
R-squared value (Best Model): 0.6659
Key Predictors: Budget, Score, Country, Genre, Original Language
Project Objectives
The main objective of this project is to predict movie revenue using independent variables from the IMDB dataset. The process involved:
Data Preprocessing: Cleaning the dataset by handling missing values, normalizing variables, and ensuring data consistency.
Exploratory Data Analysis (EDA): Understanding the relationships between movie features (e.g., budget, score, genre) and revenue through visualizations and statistical testing.
Budget
Ho: We expect that Budget will have no effect on Revenue.
Summary: There is a strong effect of Budget on Revenue that is very reliable. We tested all non-linear transformations and found that Polynomial produced the best R2 value of 0.48948 and a p-value of < 0.0001 (reject Null Hypothesis).
Genre
Ho: We expect that there will be no effect by Genre on Revenue
Summary: There is an effect by Genre on Revenue with a f-stat value of 56.2825183 and a p-value of 0.09760088 (<0.5) -> reject Null Hypothesis.
See more details in Final Project.pdf
Feature Engineering: Creating new relevant features and transforming existing ones to improve prediction accuracy.
Model Training and Testing: Applying machine learning algorithms such as Linear Regression, Bayesian Linear Regression, and Decision Forest Regression to predict movie revenue.
Model Evaluation and Fine-tune: Comparing model performance using R-squared and RMSE values to select the best model for predicting revenue.
Visualization: Tableau Dashboard
A comprehensive dashboard was created to visualize key findings from the analysis, including:
Key Findings
Budget and Revenue: Higher production budgets are strongly associated with higher revenues. Movie Score: Movies with higher IMDB scores generally perform better in terms of revenue. Country of Origin: Certain countries (e.g., Mauritius, Taiwan) have higher average movie revenues. Genre Impact: Genres like Adventure, Action, and Fantasy generate higher revenue compared to others. Status Effect: Movies in the "Released" stage earn significantly more than those in production or post-production.