Questions about the design philosophy of minibatch in PPO implementation #765

songyuc · 2024-12-23T08:24:58Z

Hi!
I'm a student learning ManiSkill2. While reading the PPO implementation, I noticed the code uses minibatch training:

args.batch_size = int(args.num_envs * args.num_steps)
args.minibatch_size = int(args.batch_size // args.num_minibatches)

I have several questions about this:

What's the rationale behind splitting the full batch into multiple minibatches for training, instead of using the complete batch at once?
I noticed in examples.sh that most tasks use num_minibatches=32. How was this value determined?
How does the minibatch size affect the training performance?

I appreciate your help!

Best regards

The text was updated successfully, but these errors were encountered:

StoneT2000 · 2024-12-23T18:21:49Z

These are good questions and to be honest difficult to answer since PPO is highly variable in behavior, it's hyperparameters can sometimes be very sensitive.

This is generally a trick used in most deep learning training setups, mini-batch gradient descent (or ascent in the case of RL). See https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/ for more details.
The initial values are primarily determined by some guesswork from me, looking at reference parameters the original isaac gym paper shared, and looking at parameters we used to use for ManiSkill2. To some extent my original thought process was to first pick a number of parallel environments and number of steps to run per parallel environment (e.g. 4096 envs, 4 steps per env), and then choose number of mini batches such that the mini batch size is at least 512 (since our old PPO usually had a mini batch size of 512 as well).
You have to experiment with it. Too small and too big for sure will yield lower performance however.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the design philosophy of minibatch in PPO implementation #765

Questions about the design philosophy of minibatch in PPO implementation #765

songyuc commented Dec 23, 2024

StoneT2000 commented Dec 23, 2024

Questions about the design philosophy of minibatch in PPO implementation #765

Questions about the design philosophy of minibatch in PPO implementation #765

Comments

songyuc commented Dec 23, 2024

StoneT2000 commented Dec 23, 2024