For this project I have used public dataset from kaggle
https://www.kaggle.com/ashirwadsangwan/imdb-dataset
The dataset contains IMDb's extensive database updated till 2020. The size of the dataset is around 1.44 GB.
OS Scikit-learn Matplotlib Pandas Numpy
- Firstly, you will need to download/upload the dataset to the colab and extract it in a folder
- copy the path of that folder and paste it into the PATH
- Run "project-strangecues.ipynb" on google colab
- It will take time to run
- One thing that should be noted is that the starting year of the data we used is 1960s.
- Now, you can see the top 1% moves in the dataframe named "Classic"
- We saw that the ratio of the top 1% movies with total movies is 0.0100026877229