NormalizeVecObservation Wrapper Shape Mismatch for Mean and Var #21

bheijden · 2024-04-06T07:58:37Z

Hi,

The mean and var in the NormalizeVecObservation wrapper located here are shaped as (NUM_ENVS, ) + obs.shape. I think they should be shaped like a single observation, that is, obs[0].shape, considering they're supposed to calculate a running average across NUM_ENVS. This approach would match the shape used in the reward normalization wrapper found here.

While this doesn't seem to affect performance, since the mean and variance are correctly computed across the batch, it unnecessarily increases memory use and has caused some unexpected issues for me, especially when saving the normalization state along with train_state.

The text was updated successfully, but these errors were encountered:

luchris429 · 2024-04-11T13:58:13Z

Ahh, good point! Do you think you could submit a PR? It should be a quick fix. I'll do it if you don't have time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NormalizeVecObservation Wrapper Shape Mismatch for Mean and Var #21

NormalizeVecObservation Wrapper Shape Mismatch for Mean and Var #21

bheijden commented Apr 6, 2024

luchris429 commented Apr 11, 2024

NormalizeVecObservation Wrapper Shape Mismatch for Mean and Var #21

NormalizeVecObservation Wrapper Shape Mismatch for Mean and Var #21

Comments

bheijden commented Apr 6, 2024

luchris429 commented Apr 11, 2024