We tend to implement stable versions of popular deep reinforcement learning algorithms and test them in various problems.
Cross-Entropy Method (CEM)
- paper: The Differentiable Cross-Entropy Method
- implementation: Ivan Pavelyev, Anton Plaksin
Deep Q-Network (DQN)
Double Deep Q-Network (DDQN)
Deep Deterministic Policy Gradient (DDPG)
Normalized Adavtage Functions (NAF)
- paper: Continuous Deep Q-Learning with Model-based Acceleration
- implementation: Stepan Martyanov
Asynchronous Advantage Actor-Critic (A3C)
- paper: Asynchronous Methods for Deep Reinforcement Learning
- implementation: Alexander Chaikov
Continuous Value Iteration (CVI)
- paper: Value Iteration in Continuous Actions, States and Time
- implementation: Stepan Martyanov
Proximal Policy Optimization (PPO)
- paper: Proximal Policy Optimization Algorithmse
- implementation: Viktor Sergeev
Soft Actor-Critic (SAC)
- papers: Learning to Walk via Deep Reinforcement Learning
- implementation: Anton Plaksin