Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the design philosophy of minibatch in PPO implementation #765

Open
songyuc opened this issue Dec 23, 2024 · 1 comment
Open

Comments

@songyuc
Copy link
Contributor

songyuc commented Dec 23, 2024

Hi!
I'm a student learning ManiSkill2. While reading the PPO implementation, I noticed the code uses minibatch training:

args.batch_size = int(args.num_envs * args.num_steps)
args.minibatch_size = int(args.batch_size // args.num_minibatches)

I have several questions about this:

  1. What's the rationale behind splitting the full batch into multiple minibatches for training, instead of using the complete batch at once?

  2. I noticed in examples.sh that most tasks use num_minibatches=32. How was this value determined?

  3. How does the minibatch size affect the training performance?

I appreciate your help!

Best regards

@StoneT2000
Copy link
Member

These are good questions and to be honest difficult to answer since PPO is highly variable in behavior, it's hyperparameters can sometimes be very sensitive.

  1. This is generally a trick used in most deep learning training setups, mini-batch gradient descent (or ascent in the case of RL). See https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/ for more details.
  2. The initial values are primarily determined by some guesswork from me, looking at reference parameters the original isaac gym paper shared, and looking at parameters we used to use for ManiSkill2. To some extent my original thought process was to first pick a number of parallel environments and number of steps to run per parallel environment (e.g. 4096 envs, 4 steps per env), and then choose number of mini batches such that the mini batch size is at least 512 (since our old PPO usually had a mini batch size of 512 as well).
  3. You have to experiment with it. Too small and too big for sure will yield lower performance however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants