You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The mean and var in the NormalizeVecObservation wrapper located here are shaped as (NUM_ENVS, ) + obs.shape. I think they should be shaped like a single observation, that is, obs[0].shape, considering they're supposed to calculate a running average across NUM_ENVS. This approach would match the shape used in the reward normalization wrapper found here.
While this doesn't seem to affect performance, since the mean and variance are correctly computed across the batch, it unnecessarily increases memory use and has caused some unexpected issues for me, especially when saving the normalization state along with train_state.
The text was updated successfully, but these errors were encountered:
Hi,
The
mean
andvar
in theNormalizeVecObservation
wrapper located here are shaped as(NUM_ENVS, ) + obs.shape
. I think they should be shaped like a single observation, that is,obs[0].shape
, considering they're supposed to calculate a running average acrossNUM_ENVS
. This approach would match the shape used in the reward normalization wrapper found here.While this doesn't seem to affect performance, since the mean and variance are correctly computed across the batch, it unnecessarily increases memory use and has caused some unexpected issues for me, especially when saving the normalization state along with
train_state
.The text was updated successfully, but these errors were encountered: