This starter kit contains 2 example policies to get started with this challenge:
- a simple single-agent DQN method
- a more robust multi-agent DQN method that you can submit out of the box to the challenge 🚀
🔗 Train the single-agent DQN policy
🔗 Train the multi-agent DQN policy
The single-agent example is meant as a minimal example of how to use DQN. The multi-agent is a better starting point to create your own solution.
You can fully train the multi-agent policy in Colab for free!
Train the multi-agent policy for 150 episodes:
python reinforcement_learning/multi_agent_training.py -n 150
The multi-agent policy training can be tuned using command-line arguments:
usage: multi_agent_training.py [-h] [-n N_EPISODES] [-t TRAINING_ENV_CONFIG]
[-e EVALUATION_ENV_CONFIG]
[--n_evaluation_episodes N_EVALUATION_EPISODES]
[--checkpoint_interval CHECKPOINT_INTERVAL]
[--eps_start EPS_START] [--eps_end EPS_END]
[--eps_decay EPS_DECAY]
[--buffer_size BUFFER_SIZE]
[--buffer_min_size BUFFER_MIN_SIZE]
[--restore_replay_buffer RESTORE_REPLAY_BUFFER]
[--save_replay_buffer SAVE_REPLAY_BUFFER]
[--batch_size BATCH_SIZE] [--gamma GAMMA]
[--tau TAU] [--learning_rate LEARNING_RATE]
[--hidden_size HIDDEN_SIZE]
[--update_every UPDATE_EVERY]
[--use_gpu USE_GPU] [--num_threads NUM_THREADS]
[--render RENDER]
optional arguments:
-h, --help show this help message and exit
-n N_EPISODES, --n_episodes N_EPISODES
number of episodes to run
-t TRAINING_ENV_CONFIG, --training_env_config TRAINING_ENV_CONFIG
training config id (eg 0 for Test_0)
-e EVALUATION_ENV_CONFIG, --evaluation_env_config EVALUATION_ENV_CONFIG
evaluation config id (eg 0 for Test_0)
--n_evaluation_episodes N_EVALUATION_EPISODES
number of evaluation episodes
--checkpoint_interval CHECKPOINT_INTERVAL
checkpoint interval
--eps_start EPS_START
max exploration
--eps_end EPS_END min exploration
--eps_decay EPS_DECAY
exploration decay
--buffer_size BUFFER_SIZE
replay buffer size
--buffer_min_size BUFFER_MIN_SIZE
min buffer size to start training
--restore_replay_buffer RESTORE_REPLAY_BUFFER
replay buffer to restore
--save_replay_buffer SAVE_REPLAY_BUFFER
save replay buffer at each evaluation interval
--batch_size BATCH_SIZE
minibatch size
--gamma GAMMA discount factor
--tau TAU soft update of target parameters
--learning_rate LEARNING_RATE
learning rate
--hidden_size HIDDEN_SIZE
hidden size (2 fc layers)
--update_every UPDATE_EVERY
how often to update the network
--use_gpu USE_GPU use GPU if available
--num_threads NUM_THREADS
number of threads PyTorch can use
--render RENDER render 1 episode in 100
📈 Performance training in environments of various sizes