In Progress
Implementation of the Proximal Policy Optimization algorithm.
PPO is an on-policy method that aims to solve the step size issue with policy gradients. Typically policy gradient algorithms are very sensitive to step size, too large a step and the agent can fall into an unrecoverable state, to small a size and the agent takes a very long time to train. PPO solves this issue by ensuring that an agents policy never deviates too far from the previous policy.
A ratio is taken of the old policy to the new policy and the delta is clipped to ensure policy changes remain within a bounds.
See the experiments folder for example implementations.
- waiting on bug gorgonia/gorgonia#373