Skip to content

Commit

Permalink
Plotter added
Browse files Browse the repository at this point in the history
Former-commit-id: f93a1cb
  • Loading branch information
awarebayes committed Aug 30, 2019
1 parent 31e0489 commit 827c9ad
Show file tree
Hide file tree
Showing 7 changed files with 255 additions and 310 deletions.
143 changes: 52 additions & 91 deletions examples/1. Vanilla RL/2. DDPG.ipynb

Large diffs are not rendered by default.

188 changes: 82 additions & 106 deletions examples/1. Vanilla RL/3. TD3.ipynb

Large diffs are not rendered by default.

147 changes: 52 additions & 95 deletions examples/2. BCQ/1. BCQ PyTorch .ipynb

Large diffs are not rendered by default.

32 changes: 18 additions & 14 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,35 +2,38 @@
<img src="./res/logo.png">
</p>

This is my school project. It focuses on Reinforcement Learning, but there are many other things I learned during the development. Key topics: time series analysis, static dataset optimization, data preparation, and EDA. It also features my code for ML20 dataset that allows iterating through the dataset in a matter of 3 minutes. As well as my custom movie embeddings. DDPG doesn't seem to be working because it exploits the Value Network by recommending the same movie over and over again. But TD3 seems to be working just fine! You can see the distance matrices for the generated actions [below](#td3-results)
This is my school project. It focuses on Reinforcement Learning for personalized news recommendation. I wrote a couple of articles explaining how it works.

First article, the code is under notes/1. Vanilla RL/, it covers basic Reinforcement Learning Approach:
First article, the code is under notes/1. Vanilla RL/, it's very beginner friendly and covers basic Reinforcement Learning Approach:

<p align="center">
<a href="https://towardsdatascience.com/reinforcement-learning-ddpg-and-td3-for-news-recommendation-d3cddec26011">
<img src="./res/Article.png">
</a>
</p>

I am working on the next article, but the code I am writing is under notes/2. BCQ/. It is about RL applications to the static dataset (i.e. a dataset without exploration). It features adversarial reinforcement learning techniques. Right now the DDPG implementation is working.

| Algorithm | Paper | Code |
|---------------------------------------|----------------------------------|----------------------------|
| Deep Q Learning | https://arxiv.org/abs/1312.5602 | WIP |
| Soft Actor Critic | https://arxiv.org/abs/1801.01290 | WIP |
| Deep Deterministic Policy Gradients | https://arxiv.org/abs/1509.02971 | examples/1.Vanilla RL/DDPG |
| Twin Delayed DDPG (TD3) | https://arxiv.org/abs/1802.09477 | examples/1.Vanilla RL/TD3 |
| Batch Constrained Q-Learning | https://arxiv.org/abs/1812.02900 | examples/1.BCQ/BCQ Pytorch |
| REINFORCE Top-K Off-Policy Correction | https://arxiv.org/abs/1509.02971 | WIP |

Repos I used code from:
- Higgsfield's [RL Advemture 2](https://github.com/higgsfield/RL-Adventure-2)

- Sfujim's [BCQ](https://github.com/sfujim/BCQ)

- LiyuanLucasLiu [Radam](https://github.com/LiyuanLucasLiu/RAdam)

## Dataset Description
This project is built for MovieLens 20M dataset, but support for other datasets is in perspective. I have parsed all the movies in the '/links.csv' to get all auxiliary data from TMDB/IMDB. Text information was fed into Google's BERT/ OpenAI GPT2 models to get text embeddings. If you want to download anything, the links are down the description.

Here is an overview:
This project is built for MovieLens 20M dataset. But you can use it with your data. You will need:
1. Embeddings in {item_id: numpy.ndarray} format
2. CSV dataset: user_id, item_id, rating, timestamp

- State - [None, frame_size * (embed_size+1) ] - PCA encoded previous actions (watched movies) embedding + rewards (ratings). All flattered and connected together
- Action - [None, embed_size] - PCA encoded current action embedding
- Reward - [None] - Integer, indicates whether the user liked the action or not
- Next state - look state - + Next state is basically the same but shifted +1 time step
- Done - [None] - Boolean, needed for TD(1)
If you dont want to bother generating embeddings, use Descrete Action models (i.e., DQN)
I also have parsed all the movies in the '/links.csv' to get all auxiliary data from TMDB/IMDB. Text information was fed into Google's BERT/ OpenAI GPT2 models to get text embeddings. If you want to download anything, the links are down the description.

## Misc Data

Expand Down Expand Up @@ -64,7 +67,7 @@ Here is an example of how the movie information looks like:

## Getting started:

1. Download the static ml20m dataset and the movie embeddings
1. Download the ml20m dataset and the movie embeddings
2. Clone this repo
3. Infos_pca128.pytorch (embeddings) into the RecNN/data folder
4. Run notes/3. DDPG and see the results
Expand Down Expand Up @@ -95,6 +98,7 @@ It doesn't seem to overfit much. Here you can see the Kernel Density Estimation
</p>

# Downloads
- [MovieLens 20M](https://grouplens.org/datasets/movielens/20m/)
- [Movie Embeddings](https://drive.google.com/open?id=1kTyu05ZmtP2MA33J5hWdX8OyUYEDW4iI)
- [Misc Data](https://drive.google.com/open?id=1TclEmCnZN_Xkl3TfUXL5ivPYmLnIjQSu)
- [Metadata for predictions](https://drive.google.com/open?id=1xjVI4uVQGsQ7tjOJ3594ZXmAEC_6yX0e)
Expand Down
1 change: 1 addition & 0 deletions recnn/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from . import data, plot, models, optim
from .debugger import Debugger
from .plot import Plotter
28 changes: 26 additions & 2 deletions recnn/debugger.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,16 @@
from scipy import stats
import torch


class Debugger:
def __init__(self):
self.debug_dict = {'error': {}, 'obj': {}, 'mat': {}}
def __init__(self, layout, testf):
self.debug_dict = {'error': {}, 'obj': {}, 'mat': {}, 'loss': {}}
self.step = 0
assert type(layout['train']) == dict
assert type(layout['test']) == dict
self.debug_dict['loss'] = layout
self.testf = testf
self.layout = layout

def log_error(self, name, x, test=False):
if test:
Expand All @@ -28,6 +35,23 @@ def log_object(self, name, x, key='mat', test=False):
name = 'test ' + name
self.debug_dict[key][name] = x

def log_loss(self, key, item, test=False):
kind = 'train'
if test:
kind = 'test'
self.debug_dict['loss'][kind][key].append(item)

def log_losses(self, loss_dict, test=False):
for key, val in loss_dict.items():
self.log_loss(key, val, test)

def log_step(self, step):
self.step = step

def test(self):
test_loss = self.testf()
self.log_losses(test_loss, test=True)

def err_plot(self):
for key, error in self.debug_dict['error'].items():
sf = int(np.sqrt(len(error['mean'])))
Expand Down
26 changes: 24 additions & 2 deletions recnn/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,5 +42,27 @@ def smooth_gauss(arr, var):


class Plotter:
def __init__(self):
assert NotImplementedError
def __init__(self, debugger, style):
self.debugger = debugger
self.style = style
self.smoothing = lambda x: smooth_gauss(x, 4)

def set_smoothing_func(self, f):
self.smoothing = f

def plot_loss(self):
for row in self.style:
fig, axes = plt.subplots(1, len(row), figsize=(16, 6))
if len(row) == 1: axes = [axes]
for col in range(len(row)):
key = row[col]
axes[col].set_title(key)
axes[col].plot(self.debugger.debug_dict['loss']['train']['step'],
self.smoothing(self.debugger.debug_dict['loss']['train'][key]), 'b-',
label='train')
axes[col].plot(self.debugger.debug_dict['loss']['test']['step'],
self.debugger.debug_dict['loss']['test'][key], 'r-.',
label='test')
plt.legend()
plt.show()

0 comments on commit 827c9ad

Please sign in to comment.