[Bug Report] with rsl_rl, actor's std becomes "nan" during PPO training #673

mitsu3291 · 2024-07-11T09:16:10Z

I am conducting reinforcement learning for a robot using rsl_rl and isaac lab. While it works fine with simple settings, when I switch to more complex settings (such as Domain Randomization), the following error occurs during training（After some progress in training）, indicating that the actor's standard deviation does not meet the condition of being ≥ 0. Has anyone experienced a similar error?
num_env is 3600

Traceback (most recent call last):
  File "/root/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 131, in <module>
    main()
  File "/root/IsaacLab/source/standalone/workflows/rsl_rl/train.py", line 123, in main
    runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True)
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/runners/on_policy_runner.py", line 153, in learn
    mean_value_loss, mean_surrogate_loss = self.alg.update()
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/algorithms/ppo.py", line 121, in update
    self.actor_critic.act(obs_batch, masks=masks_batch, hidden_states=hid_states_batch[0])
  File "/isaac-sim/kit/python/lib/python3.10/site-packages/rsl_rl/modules/actor_critic.py", line 105, in act
    
  File "/isaac-sim/exts/omni.isaac.ml_archive/pip_prebundle/torch/distributions/normal.py", line 74, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))  
RuntimeError: normal expects all elements of std >= 0.0

I investigated the value of std(self.scale) and found that the std value in a certain environment appears to be nan. (The number of columns represents the action dimensions for the robot.)

self.scale: tensor([[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
...,
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500],
[0.1926, 0.2051, 0.1785, ..., 0.7033, 0.8655, 0.8500]],
device='cuda:0')
env_id: 1111, row: tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       device='cuda:0')

The text was updated successfully, but these errors were encountered:

weifeng-lt · 2024-07-23T08:22:00Z

leggedrobotics/rsl_rl#33
leggedrobotics/rsl_rl#7

ksiegall · 2024-07-25T16:16:25Z

I encountered this bug when implementing a custom reward function for my robot. It turned out that I was returning NaN as part of my reward term, which was causing this issue. Double check your rewards/other functions and ensure that they aren't outputting NaN

Lr-2002 · 2024-08-10T06:28:07Z

@ksiegall hello, I'm facing the similar problem with hand-craft reward function. While it seems not very easy to locate to problem ?
I've found the env.scene[asset_cfg.name].data.joint_pos is all the nan. While I could not find more way to locate the reason?

The scene was modified based on the Open-Drawer-Frandka-v0, Could you give me some advice ?

ozhanozen · 2024-08-12T06:58:14Z

@ksiegall hello, I'm facing the similar problem with hand-craft reward function. While it seems not very easy to locate to problem ? I've found the env.scene[asset_cfg.name].data.joint_pos is all the nan. While I could not find more way to locate the reason?

The scene was modified based on the Open-Drawer-Frandka-v0, Could you give me some advice ?

@Lr-2002 , is there a way for you to confirm if that specific asset with the nan inside .data.joint_pos is correctly spawned to the scene? I have seen before that within one sample of my cloned environments, the robot was not successfully spawned (it was colliding with another asset), hence, I was getting similar problems.

Lr-2002 · 2024-08-12T07:01:29Z

how to make sure whether the robot/articulation is colliding with others asset? What's more I'm just create the env with one usd built from other platform , Any suggestion?

Lr-2002 · 2024-08-12T07:05:00Z

could we have a little meeting about the problem? I'm now on GMT+8 area

ozhanozen · 2024-08-12T09:27:54Z

how to make sure whether the robot/articulation is colliding with others asset? What's more I'm just create the env with one usd built from other platform , Any suggestion?

You can visualize the scene with a livestream option and see if the objects spawn and move correctly in all if you environments. In my case, one robot didn't spawn in a specific env index.

Maybe check if you can visualize the problem, otherwise you can write me a PM so we can arrange a short call. I am in GMT+2 though.

AricLau07 · 2024-08-14T09:16:51Z

@ksiegall hello, I'm facing the similar problem with hand-craft reward function. While it seems not very easy to locate to problem ? I've found the env.scene[asset_cfg.name].data.joint_pos is all the nan. While I could not find more way to locate the reason?

The scene was modified based on the Open-Drawer-Frandka-v0, Could you give me some advice ?

have the same problem, i think it is caused by the Improper rendering which causes joints to exceed joint range, then the inappropriate joint positions continues to cause problems. But what makes me wonder is why the wrong joint angle appears in the observed data? I feel like the problem might be in a lower-level function, probably?

Lr-2002 · 2024-08-14T09:20:17Z

I notice the initial state seems wrong(my cabinet fall into the ground), did you face the same problem?

AricLau07 · 2024-08-14T09:58:12Z

I notice the initial state seems wrong(my cabinet fall into the ground), did you face the same problem?

no, my cabinets seems good

Lr-2002 · 2024-08-14T15:21:28Z

all right ,does it matter if the mass is 0 ?

AricLau07 · 2024-08-15T01:47:13Z

all right ,does it matter if the mass is 0 ?

i have another env with all the link parts following mass>0 , and there is still nan data during the training process ,causes the unexpected wrong. At the begining of the training, all the sets are good, but the problem appears during the training.

weifeng-lt · 2024-08-15T01:49:02Z

Adding code actions = torch.clip(actions, min=-6.28, max=6.28) before env.step(actions) seems to help. And it is better to add a penalty to actions to prevent the actor model from outputting too large values.

AricLau07 · 2024-08-15T03:09:58Z

Adding code actions = torch.clip(actions, min=-6.28, max=6.28) before env.step(actions) seems to help. And it is better to add a penalty to actions to prevent the actor model from outputting too large values.

it seems not be the wrong with the input actions, as i print the input action as well as the observation joint position: the action seems alright(in the range of joint limit), but the observation joints position data exceed out of joint range. may be the controller of the joints works not so well?

AricLau07 · 2024-08-16T16:29:11Z

i find the issue may caused by bad urdf file.
My old env is with a KUKA IIWA robot arm and a robotiq 2f gripper, though this urdf can run well in PYBULLET and ISAAC SIM 3.0, the NAN data problem still exists and can not be fix in ISAAC LAB (ISAAC SIM 4.0).
i changed a new robot arm in our lab with the new urdf file, and there is no this kind of NAN data anymore.
So, i think this issue may be related to the urdf model (though the urdf model can be used well in other simulators)

weifeng-lt · 2024-08-17T01:11:41Z

Does the problematic URDF contain different joints?

AricLau07 · 2024-08-17T02:17:23Z

Does the problematic URDF contain different joints?

Sure, the URDF was converted to a USD file(by the isaac sim provided tool) and be used in the RL training

RandomOakForest · 2025-01-05T05:33:10Z

This new discussion is addressing a similar problem. If you think the original problem above merits a new discussion, please do start one if you still need further help. Try using Isaac Sim 4.2 and the latest Isaac Lab 1.4.0. We will close this issue for now. Thank you.

mitsu3291 changed the title ~~Actor's Standard Deviation std Not Meeting the Condition of Being ≥ 0 During PPO Training~~ with rsl_rl, Actor's Standard Deviation std Not Meeting the Condition of Being ≥ 0 During PPO Training Jul 11, 2024

mitsu3291 changed the title ~~with rsl_rl, Actor's Standard Deviation std Not Meeting the Condition of Being ≥ 0 During PPO Training~~ [Bug Report] with rsl_rl, actor's std not meeting the "≥ 0" during PPO training Jul 12, 2024

mitsu3291 changed the title ~~[Bug Report] with rsl_rl, actor's std not meeting the "≥ 0" during PPO training~~ [Bug Report] with rsl_rl, actor's std becomes "nan" during PPO training Jul 12, 2024

renezurbruegg mentioned this issue Sep 13, 2024

Clips actions to large limits before applying them to the environment #984

Open

8 tasks

RandomOakForest closed this as completed Jan 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] with rsl_rl, actor's std becomes "nan" during PPO training #673

[Bug Report] with rsl_rl, actor's std becomes "nan" during PPO training #673

mitsu3291 commented Jul 11, 2024 •

edited

Loading

weifeng-lt commented Jul 23, 2024

ksiegall commented Jul 25, 2024

Lr-2002 commented Aug 10, 2024

ozhanozen commented Aug 12, 2024

Lr-2002 commented Aug 12, 2024

Lr-2002 commented Aug 12, 2024

ozhanozen commented Aug 12, 2024

AricLau07 commented Aug 14, 2024

Lr-2002 commented Aug 14, 2024

AricLau07 commented Aug 14, 2024

Lr-2002 commented Aug 14, 2024

AricLau07 commented Aug 15, 2024

weifeng-lt commented Aug 15, 2024

AricLau07 commented Aug 15, 2024

AricLau07 commented Aug 16, 2024

weifeng-lt commented Aug 17, 2024 via email •

edited

Loading

AricLau07 commented Aug 17, 2024

RandomOakForest commented Jan 5, 2025

[Bug Report] with rsl_rl, actor's std becomes "nan" during PPO training #673

[Bug Report] with rsl_rl, actor's std becomes "nan" during PPO training #673

Comments

mitsu3291 commented Jul 11, 2024 • edited Loading

weifeng-lt commented Jul 23, 2024

ksiegall commented Jul 25, 2024

Lr-2002 commented Aug 10, 2024

ozhanozen commented Aug 12, 2024

Lr-2002 commented Aug 12, 2024

Lr-2002 commented Aug 12, 2024

ozhanozen commented Aug 12, 2024

AricLau07 commented Aug 14, 2024

Lr-2002 commented Aug 14, 2024

AricLau07 commented Aug 14, 2024

Lr-2002 commented Aug 14, 2024

AricLau07 commented Aug 15, 2024

weifeng-lt commented Aug 15, 2024

AricLau07 commented Aug 15, 2024

AricLau07 commented Aug 16, 2024

weifeng-lt commented Aug 17, 2024 via email • edited Loading

AricLau07 commented Aug 17, 2024

RandomOakForest commented Jan 5, 2025

mitsu3291 commented Jul 11, 2024 •

edited

Loading

weifeng-lt commented Aug 17, 2024 via email •

edited

Loading