For intuitive guide to the mechanics of actor-critic methods check out accompanying comic.
Notebook designed for readability and exploration rather than production. Uses a single GPU. For an industrial-strength PPO in PyTorch check out ikostrikov's. For the 'definitive' implementation of PPO, check out OpenAI baselines (tensorflow). For outstanding resources on RL check out OpenAI's Spinning Up
The notebook reproduces results from OpenAI's procedually-generated environments and corresponding paper (Cobbe 2019). All hyperparameters taken directly from paper. Built from scratch unless otherwise noted to gain intuition.