Navigation project from Udacity Deep Reinforcement Learning Nanodegree. It demonstrates how to teach an agent to collect yellow bananas while avoiding blue bananas.
- Clone deep reinforcement learning repository
- Fallow the instructions to install necessary dependencies
- Download environment for your system into this repository root
-
Linux: click here
-
Mac OSX: click here
-
Windows (32-bit): click here
-
Windows (64-bit): click here
-
Headless: click here
- Unzip (or decompress) the archive
- Start the jupyter server
- Open the Continuous_Control.ipynb notebook
- Change the kernel to drlnd
- You should be able to run all the cells
This project uses the Unity based environment prepared by the Udacity team.
There are 20 agents interacting with the environment.
The actions space is of continous with shape of 4 each between [-1,+1]:
The state is represented as a vector of 33 dimensions.
The environment gives a rewards of between [0, 1] for reacher that correctly positios the hand regarding the circling ball.
The directory saves
contains saved weights for 2 different agents:
96_96_108_actor.pth
&96_96_108_critic.pth
- Agent that learned from scratch in 108 episodes96_96_80_actor.pth
&96_96_80_critic.pth
- Agent that learned from above agent experience in 80 episodes48_48_actor_71.pth
&48_48_critic_71.pth
- Smaller agent that learned from above agent experience96_96_2491_actor.pth
&96_96_2491_critic.pth
- Learned from scratch from single agent version
Naming convention Fully connected layer 1
Fully connected layer 2
Episodes
_[actor|critic]
.pth
Most of the code is based on the Udacity code for DDPG. I've adapted some of the code by akhiadber, which adds batch normalization & training function.