You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nice experiments on PPO tricks. I've been trying to use PPO on PyBullet Envs but I find many tricks used in this repo are actually detrimental. (I have created my own minimal version that works: https://github.com/arthur-x/SimplyPPO. An important discrepancy is whether to clamp the sampled action before computing the log_prob. I find that clamping works better for BipedalWalker but hurts PyBullet performance.)
Is this because they are mainly tuned for Mujoco? It would be nice if the author gives a study on this.
The text was updated successfully, but these errors were encountered:
Nice experiments on PPO tricks. I've been trying to use PPO on PyBullet Envs but I find many tricks used in this repo are actually detrimental. (I have created my own minimal version that works: https://github.com/arthur-x/SimplyPPO. An important discrepancy is whether to clamp the sampled action before computing the log_prob. I find that clamping works better for BipedalWalker but hurts PyBullet performance.)
Is this because they are mainly tuned for Mujoco? It would be nice if the author gives a study on this.
The text was updated successfully, but these errors were encountered: