You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These are good questions and to be honest difficult to answer since PPO is highly variable in behavior, it's hyperparameters can sometimes be very sensitive.
The initial values are primarily determined by some guesswork from me, looking at reference parameters the original isaac gym paper shared, and looking at parameters we used to use for ManiSkill2. To some extent my original thought process was to first pick a number of parallel environments and number of steps to run per parallel environment (e.g. 4096 envs, 4 steps per env), and then choose number of mini batches such that the mini batch size is at least 512 (since our old PPO usually had a mini batch size of 512 as well).
You have to experiment with it. Too small and too big for sure will yield lower performance however.
Hi!
I'm a student learning ManiSkill2. While reading the PPO implementation, I noticed the code uses minibatch training:
I have several questions about this:
What's the rationale behind splitting the full batch into multiple minibatches for training, instead of using the complete batch at once?
I noticed in examples.sh that most tasks use
num_minibatches=32
. How was this value determined?How does the minibatch size affect the training performance?
I appreciate your help!
Best regards
The text was updated successfully, but these errors were encountered: