Reinforcement Learning Baseline

We tend to implement stable versions of popular deep reinforcement learning algorithms and test them in various problems.

Algorithms:

Cross-Entropy Method (CEM)

paper: The Differentiable Cross-Entropy Method
implementation: Ivan Pavelyev, Anton Plaksin

Deep Q-Network (DQN)

paper: Playing Atari with Deep Reinforcement Learning

Double Deep Q-Network (DDQN)

paper: Deep Reinforcement Learning with Double Q-learning

Deep Deterministic Policy Gradient (DDPG)

paper: Continuous Control with Deep Reinforcement Learning

Normalized Adavtage Functions (NAF)

paper: Continuous Deep Q-Learning with Model-based Acceleration
implementation: Stepan Martyanov

Asynchronous Advantage Actor-Critic (A3C)

paper: Asynchronous Methods for Deep Reinforcement Learning
implementation: Alexander Chaikov

Continuous Value Iteration (CVI)

paper: Value Iteration in Continuous Actions, States and Time
implementation: Stepan Martyanov

Proximal Policy Optimization (PPO)

paper: Proximal Policy Optimization Algorithmse
implementation: Viktor Sergeev

Soft Actor-Critic (SAC)

papers: Learning to Walk via Deep Reinforcement Learning
implementation: Anton Plaksin