Skip to content

ddcrpf/Neural-Collaborative-Filtering

Repository files navigation

Personalised Recommendation system using Neural Collaborative Filtering

Step 1: Train NCF Model on Movie Ratings

Objective: Use hard sampling to improve the model.

  • Interaction Matrix: Created a matrix of users and items (movies).

  • Labels:

    • Positive interaction: labeled as 1.
      • Hard Positive: Rating ≥ 4.
      • Soft Positive: Rating ≥ 3.
    • Negative interaction: labeled as 0.
      • Hard Negative: Rating ≤ 1.
      • Soft Negative: 2 ≤ Rating < 3.
  • Sampling: For each positive interaction, 4 negative interactions were sampled for the same user.

  • Final Dataset Format:

    • index | userid | movieid | interaction | rating.

Outcome: Trained the NCF model on this data.


Step 2: Prepare User, Movie, and Genre Embeddings

  1. User Embedding:

    • Movies with positive interactions were selected (interaction = 1).
    • Used the movie description metadata and computed embeddings using the Lamini model on Hugging Face.
    • User embedding: Calculated as the mean of all positive movie embeddings for that user.
    • Reasoning: This dense vector represents the user's preferences, which can then be compared with movie embeddings to predict future likes.
  2. Genre Embedding:

    • Movies were grouped by genres (a movie can belong to multiple genres).
    • Genre embedding: Calculated as the mean of all movies belonging to that genre.
      • Example: Embedding of "Comedy" = Mean of all comedy movie embeddings.
    • User-Genre Similarity: Compared the user embedding with genre embeddings to find genres the user is likely to enjoy.
    • Genre-Genre Similarity Matrix: A 17x17 matrix was computed to determine similarities between different genres. This helps in recommending genres related to ones the user already likes.

Step 3: Calculate User Embeddings for All Users and Compute Median Similarity Scores

  • Repeated the embedding process for all users.
  • Calculated user-genre similarity for every user and stored results in a nested dictionary (dict of DataFrames).
  • Median Similarity Score: Computed for each user.
    • Genres with similarity scores higher than the median were classified as positive genres for the user.
    • Genres with similarity scores lower than the median were classified as negative genres for the user.

Step 4: Identify Actual Positive and Negative Genres for Each User

  • Actual Positive Genres:

    • Intersection of genres where the similarity score is higher than the median and the user has interacted positively with the genre.
    • Example: If a user has high similarity with "Action" but low similarity with "Animation," the model would only mark "Action" as a positive genre.
  • Actual Negative Genres:

    • Genres where the similarity score is lower than the median and the user hasn't interacted with them.
  • Final Dataset:

    • Maintained a 1:3 ratio of positive to negative genres.
    • userid | movieid | interaction | genre | movie title.

Step 5: Fine-Tune NCF Model on New Data

  • Fine-tuned the previously trained NCF model on this new dataset.
  • Final Metrics:
    • Precision@k: 0.4
    • Recall@k: 0.2-0.3
    • NDCG@10: 0.302 (varies by user).
    • Genre NDCG@10: 0.84, indicating that most suggested movies belonged to similar or related genres.

Step 6: Frontend Development

  • Created a frontend for the model using Flask, Spline.js, HTML, and CSS.
  • Deployed the app as a web service using Azure App Service.
  • User Interface: Users can hover over movie posters, select them, and view 10 recommended movie posters based on the model's predictions.

About

Presonalised Recommendation system using NCF, Hard Sampling and Embedding Techniques

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published