Skip to content
This repository has been archived by the owner on Jul 21, 2020. It is now read-only.

Latest commit

 

History

History
38 lines (30 loc) · 2.17 KB

README.md

File metadata and controls

38 lines (30 loc) · 2.17 KB

Open In Colab

Slides - here

Exploration and exploitation

  • [main] David Silver lecture on exploration and expoitation - video
  • Alternative lecture by J. Schulman - video
  • Alternative lecture by N. de Freitas (with bayesian opt) - video
  • Our lectures (russian)
    • "mathematical" lecture (by Alexander Vorobev) '17 - slides, video
    • "practical" lecture '18 - video
    • Seminar - video

More materials

  • Gittins Index - the less heuristical approach to bandit exploration - article
  • "Deep" version: variational information maximizing exploration - video
    • Same topics in russian - video
  • Lecture covering intrinsically motivated reinforcement learning - video
    • Slides
    • Same topics in russian - video
    • Note: UCB-1 is not for bernoulli rewards, but for arbitrary r in [0,1], so you can just scale any reward to [0,1] to obtain a peace of mind. It's derived directly from Hoeffding's inequality.

Seminar

In this seminar, you'll be solvilg basic and contextual bandits with uncertainty-based exploration like Bayesian UCB and Thompson Sampling.

You will also need Bayesian Neural Networks. You will need theano/lasagne for this one:

# either
conda install Theano
# or
pip install --upgrade https://github.com/Theano/Theano/archive/master.zip
# and then lasagne
pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip

Everything else is in the notebook :)