- Finish the game completely
- Numpy ✔️
- Torch (torch.nn, torch.optim, torch.multiprocessing) ✔️
- matplotlib.pyplot ✔️
- tkinter, pygame ✔️
- Collections (namedtuple, deque) ✔️
- Enum ✔️
- Time ✔️
- Parallel computing using multiprocessing ✔️
- Reinforcement Learning : Deep Q Learning ✔️
- Reinforcement Learning : Q Learning ✔️
- Rendering graphics ✔️
- Rendering plots ✔️
- BFS (Breath First Search) ✔️
- Parallel computing using multithreading
- Compute on GPU
- The branch "DQN+MultiEnv+MultiProcessing+Tkinter" if program using multiprocessing to run 4 environments, 1 NN trainer, 1 graphic displayer in parallel. A complex structure with specific shared memory is used :
-
The result is insane with a score of 11 in about 20 games whereas the others which have in about 80 games (1st : efficient but slow) and 100 games (2nd (book) : bit less efficient but much faster. I used a 10x10 grid.
-
from book means that I used the DQN of the book (see references at the end) and from video means that I used the DQN of the video (see references at the end)
-
basic state means 11 inputs (4 directions, 4 foods, 3 dangers(straight, right, left)) and bfs state means 12 imputs (4 directions, 4 foods, 4 recommanded directions(directions where the number of cells in this direction is the higher)). Computation time is higher but better result : score 149 > 60 for the same DQN. It still canno't finish the game so I gave up this method.
DQN from Book was much faster so I could try with the all grid as state, It's working but not enough for this imput (1 night training, 2 hidden layers of 512, 250k games and only 13 as result (too far from 8x8 = 64 so this neural network will not be enough to finish the game
-
day 30/11/2022 : DQN seem to be not efficient, after many tries, I was not able to improve it. En fact the snake was focus on eating the apple but when it started being long the random and learning part was over so he was not able to understand the recommanded direction (in the beginning this value was useless so the snake haven't improve this part)
-
day 04/12/2022 : Algorith using BFS completely overtook the basic one (up to around 141) I didn't expect QLearning would be better than the theorical algorith (only BFS) : (Deep) QLearning find strategy to deal with onlyBFS issues
-
day 14/12/2022 : the network with the table as input finally works thanks to the version of DQN of the book which is more efficient. However it maybe cannot work for a single reason : DQN. Indeed, DQN algorithm just update the nn depending on one state (many but independantly) so with a given state, the algorith will try to find the best move. With this algorithm, nn canno't learn strategy, it can only learn a state and it's best action. Snake canno't be solved this way because there are 2*128^3 moves so the nn canno't learn perfectly all this moves. Also we saw that a single hidden layer of 256n had an average score of 1.5 > 0.26 (random average) so it learnt but with this nn it canno't learn much more. Moreover with 2 hiddens layers of 512 had better result with an average of 16.5 > 1.5. Btw, on this grid 8x8, this is the best result I have but still far from 8x8 = 128
-
day 15/12/2022 : I implemented multiprocessing to my DQN, so I could implement multi agents, and I swich from Pygame to Tkinter.
Video to make my first neural network on snake https://www.youtube.com/watch?v=L8ypSXwyBds&ab_channel=freeCodeCamp.org
Deep Reinforcement Learning Hands-on from Maxim Lapan, Packt edition
- https://pytorch.org/tutorials/
- official ducumentation of libraries