This repository provides a Spotify review chatbot, built to analyze and answer user inquiries based on Spotify's Google Play Store reviews.
Dataset: The full dataset for this project is available on Kaggle: 3.4 Million Spotify Google Store Reviews.
- Docker
- Docker Compose
- Poetry (Python 3.12.3)
To install project dependencies, run:
poetry install
The knowledge base is powered by Qdrant as the vector database, which stores the semantic embeddings of review texts.
-
Vector Database:
Qdrant serves as the vector database for this chatbot. To deploy Qdrant, run:
docker-compose -f docker-compose-qdrant.yml up
-
Embedding Model: We use TinyBERT for semantic representation of review texts. TinyBERT is served with Huggingface’s Optimum ONNX and transformed into vector representations using Transformers'
feature-extraction
pipeline.To deploy TinyBERT:
-
Build Optimum ONNX Serving Image:
cd tinybert/serving docker build -t optimum-onnx-serving-cpu:0.1.2 .
-
Update
model.onnx
Path in Docker Compose VolumesExample:
- /home/miftah/Downloads/job_application/mekari/review_bot/tinybert:/app/models
-
Start the TinyBERT Docker Compose:
docker-compose up
-
Once Qdrant and TinyBERT are deployed successfully, load the dataset:
-
Download the dataset: SPOTIFY_REVIEWS.csv.
-
Update
fname
inqdrant_scripts/load_data.py
to point to the dataset path. -
Run the data loading script:
cd qdrant_scripts poetry run python load_data.py
The chatbot uses LangGraph as the main library, integrating LangChain to manage AzureOpenAI and ChatPromptTemplate
for seamless dialog flow.
To run the chatbot in terminal mode:
./scripts/run_rag.sh
To deploy the chatbot API:
./scripts/run_api.sh
To deploy the UI (Chatbot API has to be deployed first):
./scripts/run_ui.sh
The following endpoints are available in this chatbot:
-
POST
/ask
Description: Responds to user inquiries related to Spotify reviews.
Input: JSON object with
user_input
(text).Example Input:
{ "user_input": "In comparison to our application, which music streaming platform are users most likely to compare ours with?" }
Example Output:
{ "response": "Users are most likely to compare Spotify with Pandora and Google Play Music, as these platforms are frequently mentioned in the reviews.", "score": 0.99 }
-
PUT
/review/<review_id>
Description: Inserts a review item into the Qdrant collection.
Input: JSON dictionary with review details.
Example Input:
{ "_id": 19206, "review_id": "14a011a8-7544-47b4-8480-c502af0ac26f", "pseudo_author_id": "152618553977019693742", "author_name": "A Google user", "review_text": "Use it every day", "review_rating": 5, "review_likes": 1, "author_app_version": "1.1.0.91", "review_timestamp": "2014-05-27 14:21:48" }
Example Output:
{ "status": "item id 19206 is inserted into spotify_review" }
-
DELETE
/review/<item_id>
Description: Deletes a review item from the Qdrant collection based on the item ID.
Example Output:
{ "status": "item id 19206 is deleted from spotify_review" }
-
GET
/collections
Description: Lists all available collections in Qdrant.
Example Output:
{ "result": [ { "collection_name": "spotify_review", "documents_count": 8443 } ] }
-
GET
/healthcheck
Description: Verifies if the API is running.
Example Output:
{ "status": "ok" }
To test these endpoints, open Swagger UI: http://localhost:<API_PORT>/docs
in your browser.