THis repository contains code for Policy Gradient Methods in Reinforcement Learning
Islam R., Lever G., Shawe-Taylor J., Improving Convergence of Deterministic Policy Gradient Methods in Reinforcement Learning. 2015
- Stochastic Policy Gradients
- Deterministic Policy Gradients
This repo contains code for actor-critic policy gradient methods in reinforcement learning (using least-squares temporal differnece learning with a linear function approximator) Contains code for:
The algorithms we consider include:
- Episodic REINFORCE (Monte-Carlo) Actor-Critic Stochastic Policy Gradient
- Stochastic Off-Policy Actor-Critic Policy Gradient
- Deterministic Policy Gradients
- Deterministic Gradients with Stochastic Exploration
- Natural Stochastic Policy Gradients
- Natural Deterministic Policy Gradients
- Deterministic Gradients with Adaptive Step Size Gradient Ascent
- Deterministic Gradients with Momentum-Based Nesterov's Accelerated Gradient
- Stochastic Gradients with Momentum-Based Nesterov's Accelerated Gradient
We consider the following MDPs using a Parameterized Controller (Agent):
- Toy MDP
- Grid World (10x10) MDP
- Mountain Car MDP
- Cart Pole MDP
- Pendulum MDP