Add test that Gymnasium and MO-Gymnasium envs match #90

pseudo-rnd-thoughts · 2024-03-14T12:40:25Z

This PR adds a test for all MO-Gymnasium envs that are also contained in Gymnasium to see if the two are still equivalent using the check_environments_match function

pseudo-rnd-thoughts · 2024-03-14T12:46:56Z

@ffelten or @LucasAlegre It looks like most of the mo-gymnasium reward vectors do not by default match the gymnasium versions. Is this expected?

tests/test_envs.py

ffelten · 2024-03-14T12:50:59Z

@ffelten or @LucasAlegre It looks like most of the mo-gymnasium reward vectors do not by default match the gymnasium versions. Is this expected?

Lol, just what I said in my comment. Some environments are indeed different, for example, we usually remove the scaling factors because they are unnecessary since the rewards are not aggregated into a single scalar. Another example is environments that add an extra component to the reward, e.g. mountaincar penalizes for changing directions.

EDIT: still, I think these kinds of tests are relevant, I'd just ignore the reward tests.

pseudo-rnd-thoughts · 2024-03-14T13:17:37Z

@ffelten or @LucasAlegre It looks like most of the mo-gymnasium reward vectors do not by default match the gymnasium versions. Is this expected?

Lol, just what I said in my comment. Some environments are indeed different, for example, we usually remove the scaling factors because they are unnecessary since the rewards are not aggregated into a single scalar. Another example is environments that add an extra component to the reward, e.g. mountaincar penalizes for changing directions.

EDIT: still, I think these kinds of tests are relevant, I'd just ignore the reward tests.

Amazing, I'm glad this isn't actually an issue then.
Is there a way of knowing the reward weightings to make it equivalent to the Gymnasium version?

LucasAlegre · 2024-03-14T13:28:14Z

@ffelten or @LucasAlegre It looks like most of the mo-gymnasium reward vectors do not by default match the gymnasium versions. Is this expected?

Lol, just what I said in my comment. Some environments are indeed different, for example, we usually remove the scaling factors because they are unnecessary since the rewards are not aggregated into a single scalar. Another example is environments that add an extra component to the reward, e.g. mountaincar penalizes for changing directions.
EDIT: still, I think these kinds of tests are relevant, I'd just ignore the reward tests.

Amazing, I'm glad this isn't actually an issue then. Is there a way of knowing the reward weightings to make it equivalent to the Gymnasium version?

Yes, I believe for most of them (if not all) that is indeed possible. I will try to document this somewhere, any suggestion?

pseudo-rnd-thoughts · 2024-03-14T13:47:33Z

In the documentation?

Are you aware that Reacher action space is different from the Gymnasium one?

LucasAlegre · 2024-03-14T13:52:51Z

In the documentation?

I will add to the pydoc of each environment class.

Are you aware that Reacher action space is different from the Gymnasium one?

Yes, Reacher is probably the environment with the most changes from the original. We also changed .xml to add more targets to the environment. This one is not possible to map to Reacher from Gymnasium

pseudo-rnd-thoughts · 2024-03-14T14:00:20Z

I will add to the pydoc of each environment class.

Thanks, this will be really helpful for me because I want to learn the equivalent reward vector version of the standard agent which requires that the mo version is equivalent to the standard version

LucasAlegre · 2024-03-14T19:43:12Z

@pseudo-rnd-thoughts see:
#92

I could not map the rewards for Humanoid and Walker2d because we added the healthy_reward to all objectives, and then we can not scale it separately. What we could do is create versions of them with the healthy_reward modeled as a separate objective. What do you think? @ffelten

ffelten · 2024-03-14T21:56:42Z

I find it odd to create environments for the sake of testing lol

LucasAlegre · 2024-03-14T22:03:21Z

I find it odd to create environments for the sake of testing lol

It is not only for testing, since I believe the relative weighting of the healthy_reward might also induce new trade-offs. I saw some papers using it. The reason I did not include it was to focus on the velocity/energy trade-off, which is clearer.

LucasAlegre · 2024-03-15T14:49:47Z

I figure it out how to scale the healthy_reward so that Humanoid and Walker2d can be mapped to the original Gymnasium env. But this will make the current results not reproducible, so I will implement these changes on v5 (PR #85).

pseudo-rnd-thoughts · 2024-03-18T16:58:17Z

@ffelten or @LucasAlegre can we merge this PR and the reward vector equivalent be in another PR?

Add test that Gymnasium and MO-Gymnasium envs match

5ac89f0

ffelten reviewed Mar 14, 2024

View reviewed changes

tests/test_envs.py Show resolved Hide resolved

pseudo-rnd-thoughts closed this Mar 14, 2024

pseudo-rnd-thoughts reopened this Mar 14, 2024

Ignore rewards

386a4c6

Remove reacher from list

534fe1c

ffelten approved these changes Mar 18, 2024

View reviewed changes

ffelten merged commit a30e5c0 into Farama-Foundation:main Mar 18, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test that Gymnasium and MO-Gymnasium envs match #90

Add test that Gymnasium and MO-Gymnasium envs match #90

pseudo-rnd-thoughts commented Mar 14, 2024

pseudo-rnd-thoughts commented Mar 14, 2024

ffelten commented Mar 14, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Mar 14, 2024

LucasAlegre commented Mar 14, 2024

pseudo-rnd-thoughts commented Mar 14, 2024 •

edited

Loading

LucasAlegre commented Mar 14, 2024

pseudo-rnd-thoughts commented Mar 14, 2024

LucasAlegre commented Mar 14, 2024

ffelten commented Mar 14, 2024

LucasAlegre commented Mar 14, 2024

LucasAlegre commented Mar 15, 2024

pseudo-rnd-thoughts commented Mar 18, 2024

Add test that Gymnasium and MO-Gymnasium envs match #90

Add test that Gymnasium and MO-Gymnasium envs match #90

Conversation

pseudo-rnd-thoughts commented Mar 14, 2024

pseudo-rnd-thoughts commented Mar 14, 2024

ffelten commented Mar 14, 2024 • edited Loading

pseudo-rnd-thoughts commented Mar 14, 2024

LucasAlegre commented Mar 14, 2024

pseudo-rnd-thoughts commented Mar 14, 2024 • edited Loading

LucasAlegre commented Mar 14, 2024

pseudo-rnd-thoughts commented Mar 14, 2024

LucasAlegre commented Mar 14, 2024

ffelten commented Mar 14, 2024

LucasAlegre commented Mar 14, 2024

LucasAlegre commented Mar 15, 2024

pseudo-rnd-thoughts commented Mar 18, 2024

ffelten commented Mar 14, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Mar 14, 2024 •

edited

Loading