Implementation of the TD3 - twin delayed DDPG algorithm for reinforcement learning (original publication link), particularlly usefull for continuous action space-continuous state space problems.
The algorithm was tested on the BipedalWalker-v3 environment. In order to evaluate the variability of this algorithm, we trained 15 different agents on a high-performance GPU with CUDA for 550 episodes. We recorded the obtained reward by each agent, and obtained the following results:
The learning process can be observed on the following video:
Technical details about the algorithm can be found in the acompanying report.