This Jupyter Notebook provides a machine learning solution for predicting the ranking of a movie on the Letterboxd platform based on its users' past reviews.
The notebook is organized as follows:
Data Preprocessing: The notebook starts with data preprocessing steps, including importing libraries, loading and cleaning the dataset, and exploring the features.
Feature Engineering: The next section performs feature engineering, which involves transforming the features into a format that can be used by machine learning algorithms. This includes handling missing data, encoding categorical variables, and creating new features.
Model Training and Evaluation: In this section, several machine learning algorithms are applied to the data to predict movie rankings. The models are evaluated using a variety of metrics, and the best performing model is selected.
Model Interpretation: Finally, the selected model is interpreted, including identifying which features are most important for predicting movie rankings.
The dataset used in this notebook is sourced from Kaggle (https://www.kaggle.com/datasets/ramjasmaurya/top-250s-in-imdb). The dataset includes approximately 1000 movies. The data includes features such as the title, year, and director, among others.
This notebook requires the following libraries to be installed:
- Pandas
- NumPy
- Scikit-learn
- XGBoost These libraries can be installed using pip, e.g.:
pip install pandas numpy scikit-learn xgboost
This notebook provides a practical example of how to use machine learning to predict movie rankings on Letterboxd. By following the steps outlined in this notebook, you can adapt this solution to your own datasets and problems. If you have any questions or feedback, feel free to contact me.