Done in collaboration with Megha Roy.
A Reinforcement Learning implementation to train an agent, a car to drive itself in an uneven terrain to reach the destination object, a man radomly spawned in the region. This repository contains some of the essential files and their description. Since uploading all the necessary files would be impossible, we have tried to provide as much context as possible.
The concept behind reinforcement learning is quite simple to understand. The whole task of learning is modeled after the way humans learn how to calibrate good and bad behaviour in life. As in life, a child may touch a flame, realize that it burns him or her. This would discourage the child from putting his hand into the flame again. On the other hand, doing house chores may earn him a few treats from his parents. This positively reinforces the act of doing house chores as the child knows he would get rewards for it. Fig 1.1, gives a general idea about reinforcement learning in general. The agent is an independent entity that represents a simple human being that will learn to perform a task. The environment is a model of the world, where the agent takes a particular action. Based on the action taken, the state (, or the current condition) of the environment changes into a new state and it gives out some rewards (, or feedback) to the agent that suggests whether the action it took resulted in a condition that is favourable to the agent or not.
The environment was developed in Unity. Unity provides a flexible interface with a large set of assets and libraries that make the task easier. First we got hold of the ML-Agents Toolkit of Unity. The github repository has one of the best documentation to get started with. We used ML-Agents version 0.15, however the current version id release 1.
Note: Please make sure to check the dependencies of the libraries and the tools. Various versions had various support requirement which are sometime confusing.
We used the Standard Unity Assets pack from its extensive Assets Store and built the terrain, the car. The human figure was imported from a separate asset. A little tinkering with the settings got us easily to interact with the environment. The next part was defining the script to control the Reward Function. The details are updated in the CarAgent.cs script. This is a part of all the scripts associated with the car asset. Without changing much of the controller function, we hopped straight into defining out function. The script is docuented enough to understand the functions and the rewards.
The reward function for the agent includes three main components:
- A reward of -0.04f for each step taken.
- It receives a reward of -.6f if the car falls off the platform.
- On reaching the target, the final reward of 1.0f is received. A -0.04f reward at each step is to force the agent to take a step at each instance instead of staying ideal.
We first trained the agent using PPO available with the ML-Agents Toolkit. In the next part we defined an A2C model that trains the agent by interacting with the application using the Gym environment. Gym from openAI is a great resource for people actively working in the field of Reinforcement Learning. Most of the results are detailed in our paper that is uploaded in the repository. Some interesting training scenarios are shown here:
The agent is able to reach the destination. This was trained using the PPO algorithm
A scene from an episode during training with our A2C algorithm:
One of the drawbacks we faced during training was when the agent decided that falling off the platform provided better rewards than exploring the terrain for the target. It converged to a local minima. We tried to adjust the reward function in various ways possible, bu the agent kept on learning this anamoly. Here is a scene from the training:
- Due to unavailability of powerful resources we were constrained to work with only a CPU environment. Though this affected our final accuracy, it laid the base for further improvement and training.
- We have tried a tweaked the reward function but still faced the problem of the car agent jumping off the platform even after several epochs of training.
- Next up, adding a few more barriers within the terrain for the car to avoid. Also opting for a more powerful computer for better and faster training.