Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MuJoCo to 3.X #493

Merged
merged 5 commits into from
Nov 5, 2024
Merged

Conversation

rainx0r
Copy link
Contributor

@rainx0r rainx0r commented Aug 10, 2024

A key part missing from recent Metaworld renovation work is to stop depending on MuJoCo 2.X. Upgrading to MuJoCo 3.X opens up useful features such as:

  • The ability to use MJX in the future
  • The repository properly installing and working on latest macOS (MuJoCo 2's library path for OpenGL is wrong)

The main reason for not using MuJoCo 3.X before was that the benchmark did not work as intended and appeared to be broken (i.e. tests would fail).

Over the past few months, I did a deep dive on why this is and how to fix it. You can find a relatively in-depth documentation of the process and the findings in this issue on the MuJoCo repo.

In the end I had to make a single change to get it working:

  • Adjust the mocap weld constraint to have the correct relative position we want between the hand and the mocap bodies (that's what changing 1.0 to -1.0 achieves, which corresponds to the first part of the quaternion in the rotational part of the relpose attribute of the weld constraint), and also to weigh the rotational part of the constraint quite high (set torquescale, the last part of the constraint data, to 3.75 from 1.0)

(For an explanation as to why this change was necessary please see the discussion on the this issue in the MuJoCo repo)

I tried different values for torquecale but found that at 3.5 soccer breaks, and at 3.75 and up everything works.

@rainx0r
Copy link
Contributor Author

rainx0r commented Aug 11, 2024

I initially got it working and visually verified that the arm position looked like MuJoCo 2 using torquescale="5", then before making the PR I tried lower / more conservative values before ending on 3.75 as the minimum value that had tests passing on my machine. Apparently 3.75 and even 4 is still brittle as CI was failing so I suppose let's go with 5.

@Kallinteris-Andreas
Copy link

Does this update break the environments with mujuco<3.0.0

besides, visual validation of the rendered arm, has there been done any other validation, can you collect a trajectory of 1k steps with the old version of metaworld and mujoco and check how it compares with the new version of metaworld and mujoco (with the same action)

Thanks

@rainx0r
Copy link
Contributor Author

rainx0r commented Aug 12, 2024

With regards to validation, I did not run much before opening this pull request as I was requested by another maintainer to open it as soon as possible to run some RL experiments against it and verify that algorithm performance is not impacted.

That said I ran some comparisons today.

  1. The new weld constraints do not seem to break the environments if I downgrade MuJoCo to <3.0.0. The tests still pass. That said I noticed soccer-v2 fail once in my runs with (0.40 / 0.50) success rate (right on the verge of 80% which is the threshold). In general I think that for MuJoCo <3.0.0 the previous weld parameters should be kept.
  2. For validating consistency with the "old version" (old weld parameters + mujoco 2.3.7), I wrote this script to collect a trajectory for 1000 timesteps and a video with random actions. I ran it across 2 environments, one with the new weld parameters + mujoco 3, and one with the old weld parameters + mujoco 2. I then wrote this script to compare the two trajectories and get the mean square error and mean absolute error for their observations and rewards, which I report below for reach-v2. Additionally I'm attaching the two videos in MP4 and Webm. To me they look more or less identical and the actual losses seem very low to me as well.
Observation loss (MSE): 4.0295320364805266e-08
Observation loss (MAE): 6.401240239811621e-05
Reward loss (MSE): 0.0003537485342146791
Reward loss (MAE): 0.012771805151261308
MP4 videos
reach-v2_mujoco2.mp4
reach-v2_mujoco3.mp4
Web videos
reach-v2_mujoco2.webm
reach-v2_mujoco3.webm

In case this is not satisfactory, the mai thing I can think of to better match the old behaviour is to perhaps run a hyperparameter tuner on the weld constraint parameters using these losses against the mujoco 2 / old weld parameter trajectory, which I might do anyway out of curiosity now that I have this set up.

Also validation RL experiments are still planned to check for behavioural similarity (which in my opinion is what matters more than the actual error between observation values).

@rainx0r
Copy link
Contributor Author

rainx0r commented Aug 13, 2024

I ran some hyper parameter tuning and it looks like with torquescale set to around 8.2 we can get an even closer match

Observation loss (MSE): 1.9481068874279174e-08
Observation loss (MAE): 4.892414725179199e-05
Reward loss (MSE): 6.343976866590461e-05
Reward loss (MAE): 0.005374891027247032

@Kallinteris-Andreas
Copy link

  1. There is no need to try to match the behavior of mujoco==2.3.7 and mujoco==3.0.0 on top of which I am not sure if fine-tuning for 1 environment would do it anyway

  2. also does the change in mocap_welds sign, break the environments with mujuco<3.0.0

  3. @reginald-mclean do you want to maintain support for both mujuco<3 and mujoco>=3, if not I would recommend a version bump

@reginald-mclean
Copy link
Collaborator

reginald-mclean commented Aug 21, 2024

@Kallinteris-Andreas

There is no need to try to match the behavior of mujoco==2.3.7 and mujoco==3.0.0 on top of which I am not sure if fine-tuning for 1 environment would do it anyway

There's a slight degradation in performance when training a multi-task algorithm. The simulation is more accurate with Mujoco >= 3 which is likely why. We're digging into it to make sure that's the case

also does the change in mocap_welds sign, break the environments with mujuco<3.0.0 & @reginald-mclean do you want to maintain support for both mujuco<3 and mujoco>=3, if not I would recommend a version bump

Yes, this will break for Mujoco <= 2.3.7. We do not want to support Mujoco <= 2.3.7, so I agree that a version bump is necessary. We also are working on getting the docs and a few other things ready for a polished V3 release so I think this can be included in that release.

@rainx0r
Copy link
Contributor Author

rainx0r commented Aug 22, 2024

  1. There is no need to try to match the behavior of mujoco==2.3.7 and mujoco==3.0.0 on top of which I am not sure if fine-tuning for 1 environment would do it anyway

I don't think it matters what environment it's fine-tuned for as long as the difference in rewards isn't used as the objective to minimise as the parameters here only affect the robot arm itself and how it responds to actions randomly sampled uniformly from its action space.

That said the difference is infinitesimal to begin with and as is shown in the tuning report the difference for values between 5 and 9 is almost within margin of error, so I don't think the actual value matters as long as it's within that range.

I will say anecdotally I have never seen tests fail for torquescale=5 (my initial value) but I have seen them fail (at 0.39/0.5 or 0.4/0.5 like in this latest CI run) for 8.22 which is what the tuner found. I might revert it back to 5.

Also I definitely agree that there is no need to match mujoco==2.3.7 down to arbitrary precision, but I thought it would be interesting to try.

  1. also does the change in mocap_welds sign, break the environments with mujuco<3.0.0

The entire rotational part of the weld constraint is ignored in mujoco==2.3.7 due to a bug so any changes in those values actually shouldn't matter. To verify, I ran the tests with the quat being (-1, 0, 0, 0) and torquescale=1 with mujoco==2.3.7 and they pass.

But as @reginald-mclean said, I don't think supporting mujoco<=2.3.7 is desirable anyway and a version bump for supporting MuJoCo 3 would make the most sense.


Also I have finished running a multi-task algorithm (MTMHSAC) on the MT10 benchmark for 3 seeds and the results seem in line with previous ones obtained on MuJoCo 2.3.7.

An MT50 run is still planned but I am currently facing some technical difficulties getting experiments running on my university's compute cluster, which is the only compute I have access to that could tractably handle MT50 at the moment.

@reginald-mclean
Copy link
Collaborator

@rainx0r have we confirmed that everything looks good from a performance standpoint using torquescale 5?

@rainx0r
Copy link
Contributor Author

rainx0r commented Sep 5, 2024

I'd say so. I've got MT50 runs done more or less (one of the seeds got cut off at 96M timesteps and my compute is out of commission at the moment so I can't resume it) which can be seen on this report. MT10 runs can be found on the report posted previously for torquescale 8.2, and for torquescale 5 there are some newer runs here which seem to replicate the 8.2 results.

Since MT10 and MT50 runs seem to work and reproduce existing results then this should indicate that the benchmark does indeed work on MuJoCo 3.

Also below are some comments on the MuJoCo 3-related settings after I've done some more thinking and research into them. tl;dr I think relpose quat could be reverted to (1,0,0,0) but it doesn't really matter, and torquescale should stay at 5 even though it also doesn't seem to matter much, but I'm open to suggestions for ways to find a better value if anyone has any.

P.S. 1: On the torquescale value

I mostly switched from torquescale 8.2 to 5 in an attempt to fix my failing MT50 runs but it seems like the problem was elsewhere. After debugging my algorithm implementation a bit more I was also able to more or less replicate the successful torquescale=5 MT50 runs on torquescale=8.2 on local compute for 1 seed. So overall the value doesn't really seem to matter to the RL algorithm. My opinion would be to stick with 5 for simplicity, unless there are any suggestions for how to potentially get a better value.

The optimisation metric I used for my previous experiment that found the value of 8.2 was very noisy so I don't have much faith in that value especially as there's no clear trend in the hyper parameter sweep. The distance between the observations across the two versions being so small as to be noisy as well as the two values not mattering to the RL algorithm should indicate that the value ultimately isn't that important past a certain point (roughly 4.0) though, unless I'm missing something.

P.S. 2: On the relpose quat

As for the relpose quat value, it's important that it's not (0,0,0,0)--which sets relpose based on qpos0. Then both (1,0,0,0) and (-1,0,0,0) represent the same rotation, so both should work and indeed at the same torquescale both have tests pass.

I used (-1,0,0,0) when I was trying out different things to get MuJoCo 3 working (and frankly I hadn't realised it was the same (lack of) rotation as (1,0,0,0) as I only read more into quaternions later; I'm sadly not an engineering major nor a game dev). We could revert it to (1,0,0,0) for simplicity and to avoid changing things when not necessary, or keep it as is since it does represent the same rotation after all and we have confirmed that (-1,0,0,0) works for RL. Personally I think it shouldn't matter.

@reginald-mclean
Copy link
Collaborator

@rainx0r thank you for the update and the hard work! Results look good to me

@reginald-mclean reginald-mclean merged commit cca35cf into Farama-Foundation:master Nov 5, 2024
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants