Trivago Recsys Experiments

How to use:

These scripts expect a data/trivago subdirectory...
- mkdir data
- put the trivago dataset in data/ OR if using ada, create a link to the trivago folder in the storage node.
- the files you need are: ~~train.csv~~, ~~validation.csv~~, ~~confirmation.csv~~
compute features for the experiments:
- python3 load_trivago.py
- python3 hash_user_id.py
- python3 load_trivago_blind.py
run the experiments:
- python3 train_eval.py
- python3 train_eval_blind.py

More details on each script:

load_trivago.py

This script creates a readable parquet file which contains all the data which is required to train the models. In this file, I've included a range of dataclasses to help me understand the dataset, and to make feature extraction more intuitive and readable. The objects I've defined are as follows:

Hotel
Interaction
Session
UserProfile
SessionData

Following LogicAI's 2019 recsys strategy, I computed user features on a rolling basis to prevent overfitting. Particularly, I sorted sessions by their starting timestamp, and added each interaction to a user profile and graph only after features had been extracted from that interaction.

compute_ctr.py

This script computes the click-through ratios (CTRs) for all items.

extract_hotel_features.py

Construct hotel (item-based) features for learning.

hash_user_id.py

We are interested in knowing what happens when we pretend that all users are new visitors to the site. We can't just null out the user based features because we use a graph based model as one of our features, which still may be a useful features even if edges only occur between sessions and items (rather than users and items).

load_trivago_blind.py

This script would remarkably similar to the above, but it excludes user profiles and computes fewer features. It also constructs a user-item graph as a ~~session-item~~ graph instead, which is more sparse.

Future work/stuff I didn't cover this summer.

I tackled a limited range of features which were considered most important in the challenge, while using my best judgement to exclude ones which I considered "cheating". For example, the first-place submission had

Some more features may have been helpful in boosting our final test MRR (.576)

In these experiments, I found that the user was really not important in recommending items over such a short period. That is, people likely do not plan multiple, different vacations/work trips in one week.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trivago Recsys Experiments

How to use:

More details on each script:

load_trivago.py

compute_ctr.py

extract_hotel_features.py

hash_user_id.py

load_trivago_blind.py

Future work/stuff I didn't cover this summer.

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
Attic		Attic
.gitignore		.gitignore
README.md		README.md
compute_ctr.py		compute_ctr.py
extract_hotel_features.py		extract_hotel_features.py
hash_user_id.py		hash_user_id.py
load_trivago.py		load_trivago.py
load_trivago_blind.py		load_trivago_blind.py
requirements.txt		requirements.txt
train_eval.py		train_eval.py
train_eval_blind.py		train_eval_blind.py

c-koster/trivago-recsys-experiments

Folders and files

Latest commit

History

Repository files navigation

Trivago Recsys Experiments

How to use:

More details on each script:

load_trivago.py

compute_ctr.py

extract_hotel_features.py

hash_user_id.py

load_trivago_blind.py

Future work/stuff I didn't cover this summer.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages