Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test that Gymnasium and MO-Gymnasium envs match #90

Conversation

pseudo-rnd-thoughts
Copy link
Member

This PR adds a test for all MO-Gymnasium envs that are also contained in Gymnasium to see if the two are still equivalent using the check_environments_match function

@pseudo-rnd-thoughts
Copy link
Member Author

@ffelten or @LucasAlegre It looks like most of the mo-gymnasium reward vectors do not by default match the gymnasium versions. Is this expected?

@ffelten
Copy link
Collaborator

ffelten commented Mar 14, 2024

@ffelten or @LucasAlegre It looks like most of the mo-gymnasium reward vectors do not by default match the gymnasium versions. Is this expected?

Lol, just what I said in my comment. Some environments are indeed different, for example, we usually remove the scaling factors because they are unnecessary since the rewards are not aggregated into a single scalar. Another example is environments that add an extra component to the reward, e.g. mountaincar penalizes for changing directions.

EDIT: still, I think these kinds of tests are relevant, I'd just ignore the reward tests.

@pseudo-rnd-thoughts
Copy link
Member Author

@ffelten or @LucasAlegre It looks like most of the mo-gymnasium reward vectors do not by default match the gymnasium versions. Is this expected?

Lol, just what I said in my comment. Some environments are indeed different, for example, we usually remove the scaling factors because they are unnecessary since the rewards are not aggregated into a single scalar. Another example is environments that add an extra component to the reward, e.g. mountaincar penalizes for changing directions.

EDIT: still, I think these kinds of tests are relevant, I'd just ignore the reward tests.

Amazing, I'm glad this isn't actually an issue then.
Is there a way of knowing the reward weightings to make it equivalent to the Gymnasium version?

@LucasAlegre
Copy link
Member

@ffelten or @LucasAlegre It looks like most of the mo-gymnasium reward vectors do not by default match the gymnasium versions. Is this expected?

Lol, just what I said in my comment. Some environments are indeed different, for example, we usually remove the scaling factors because they are unnecessary since the rewards are not aggregated into a single scalar. Another example is environments that add an extra component to the reward, e.g. mountaincar penalizes for changing directions.
EDIT: still, I think these kinds of tests are relevant, I'd just ignore the reward tests.

Amazing, I'm glad this isn't actually an issue then. Is there a way of knowing the reward weightings to make it equivalent to the Gymnasium version?

Yes, I believe for most of them (if not all) that is indeed possible. I will try to document this somewhere, any suggestion?

@pseudo-rnd-thoughts
Copy link
Member Author

pseudo-rnd-thoughts commented Mar 14, 2024

In the documentation?

Are you aware that Reacher action space is different from the Gymnasium one?

@LucasAlegre
Copy link
Member

In the documentation?

I will add to the pydoc of each environment class.

Are you aware that Reacher action space is different from the Gymnasium one?

Yes, Reacher is probably the environment with the most changes from the original. We also changed .xml to add more targets to the environment. This one is not possible to map to Reacher from Gymnasium

@pseudo-rnd-thoughts
Copy link
Member Author

I will add to the pydoc of each environment class.

Thanks, this will be really helpful for me because I want to learn the equivalent reward vector version of the standard agent which requires that the mo version is equivalent to the standard version

@LucasAlegre
Copy link
Member

@pseudo-rnd-thoughts see:
#92

I could not map the rewards for Humanoid and Walker2d because we added the healthy_reward to all objectives, and then we can not scale it separately. What we could do is create versions of them with the healthy_reward modeled as a separate objective. What do you think? @ffelten

@ffelten
Copy link
Collaborator

ffelten commented Mar 14, 2024

I find it odd to create environments for the sake of testing lol

@LucasAlegre
Copy link
Member

I find it odd to create environments for the sake of testing lol

It is not only for testing, since I believe the relative weighting of the healthy_reward might also induce new trade-offs. I saw some papers using it. The reason I did not include it was to focus on the velocity/energy trade-off, which is clearer.

@LucasAlegre
Copy link
Member

I figure it out how to scale the healthy_reward so that Humanoid and Walker2d can be mapped to the original Gymnasium env. But this will make the current results not reproducible, so I will implement these changes on v5 (PR #85).

@pseudo-rnd-thoughts
Copy link
Member Author

@ffelten or @LucasAlegre can we merge this PR and the reward vector equivalent be in another PR?

@ffelten ffelten merged commit a30e5c0 into Farama-Foundation:main Mar 18, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants