A project aimed at enhancing the user movie search system in Russian.
Development of an application using Streamlit. The service is deployed on HuggingFace Spaces.
The application operates on the BERT model - rubert-tiny2.
- We have scraped 12,000 movies from the mail.ru catalog. The information used for recommendations includes the movie description from the movie page and editorial reviews.
- Bert encodes the description+review for each movie into a vector.
- The user enters the movie description, which is also passed through BERT, yielding encoded information in vector form.
- Using faiss, based on the Euclidean distance between the user's description and the movies from the mail.ru catalog, a selected number of predictions with the highest similarity is displayed.
import numpy
import pandas
import faiss
import torch
import joblib
import streamlit
from transformers import AutoTokenizer, AutoModel
-
To create a Python virtual environment for running the code, enter:
python3 -m venv my-env
. -
Activate the new environment:
- Windows:
my-env\Scripts\activate.bat
- macOS and Linux:
source my-env/bin/activate
- Windows:
-
Install all dependencies from the requirements.txt file:
pip install -r requirements.txt
.