You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to write a pick policy for a allegro robot to pick up a randomly selected + positioned YCB object. I was wondering if I could get some feedback on potential optimizations. My current implementation uses RGBD + state observations and PPO for learning manipulation of YCB objects.
I haven't been getting successful results, as shown in the videos below.
In the 40th video, the robot is able to at least grasp the object but not yet lift it up:
40.mp4
In the 62nd video (last checkpoint), the robot is still unable to pick up an object and just awkwardly pushes it across the table:
62.mp4
My CNN processes both RGB and depth information together. I'm using standard convolutional layers with ReLU activations, but I'm not sure if this architecture is optimal for depth processing. I wrote an issue about it here: #794
classNatureCNN(nn.Module):
def__init__(self, sample_obs):
self.out_features=0feature_size=256if"rgbd"insample_obs:
# split last dimension (first 3 channels for RGB, last channel for depth)# CNN for RGBD inputcnn=nn.Sequential(
nn.Conv2d(
in_channels=16, # Full RGBD channelsout_channels=32,
kernel_size=8,
stride=4,
padding=0,
),
nn.ReLU(),
nn.Conv2d(
in_channels=32, out_channels=64, kernel_size=4, stride=2, padding=0
),
nn.ReLU(),
nn.Conv2d(
in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=0
),
nn.ReLU(),
nn.Flatten(),
)
# RGBD dimension calculationwithtorch.no_grad():
n_flatten=cnn(sample_obs["rgbd"].float().permute(0,3,1,2).cpu()).shape[1]
fc=nn.Sequential(nn.Linear(n_flatten, feature_size), nn.ReLU())
extractors["rgbd"] =nn.Sequential(cnn, fc)
self.out_features+=feature_sizeif"state"insample_obs:
# for state data we simply pass it through a single linear layerstate_size=sample_obs["state"].shape[-1]
extractors["state"] =nn.Linear(state_size, 256)
self.out_features+=256
...
My reward function combines multiple components including lifting, finger control, and distance-based rewards. It is supposed to incentivize lifting objects above the table (lift_reward), maintain proper finger spread (finger_spread_reward), reduce distance between fingertips and target object (hand_close_reward), center the hand's approach (center_close_reward), and achieve enough lift height (height_reward).
Below is my evaluate function, where success requires lifting the target object above a height threshold of 1.25m from the table's surface. The agent fails if either the robot's fingers collide with the table (below table_height + 0.02m) or if the object falls below the table surface (table_height - 0.02m). The function assigns sparse rewards: +5.0 for success and -1.25 for failure.
Due to limited VRAM (8192MiB total), I'm trained it with 10,000,000 total_timesteps, 64 num_envs, 2 num_eval_envs, 100 num_steps, 4 update_epochs, and no/None reconfiguration and eval_reconfiguration frequency. Having a reconfiguration frequency significantly increased my VRAM usage, because every couple of reconfigurations I would see a spike in 500-1000 MiB increase in GPU memory being used. I'm not sure if this part is a memory leak as per #467.
I'm mainly concerned about my reward structure, particularly in compute_dense_reward in my ycb_env.py file, as well as my policy architecture for RGBD + state processing in my NatureCNN __init__ function in my ppo_rgbd.py file. I'm still learning atm, so any feedback would be very appreciated!
Thank you!
The text was updated successfully, but these errors were encountered:
I'm trying to write a pick policy for a allegro robot to pick up a randomly selected + positioned YCB object. I was wondering if I could get some feedback on potential optimizations. My current implementation uses RGBD + state observations and PPO for learning manipulation of YCB objects.
I haven't been getting successful results, as shown in the videos below.
In the 40th video, the robot is able to at least grasp the object but not yet lift it up:
40.mp4
In the 62nd video (last checkpoint), the robot is still unable to pick up an object and just awkwardly pushes it across the table:
62.mp4
My CNN processes both RGB and depth information together. I'm using standard convolutional layers with ReLU activations, but I'm not sure if this architecture is optimal for depth processing. I wrote an issue about it here: #794
My reward function combines multiple components including lifting, finger control, and distance-based rewards. It is supposed to incentivize lifting objects above the table (lift_reward), maintain proper finger spread (finger_spread_reward), reduce distance between fingertips and target object (hand_close_reward), center the hand's approach (center_close_reward), and achieve enough lift height (height_reward).
Below is my
evaluate
function, where success requires lifting the target object above a height threshold of 1.25m from the table's surface. The agent fails if either the robot's fingers collide with the table (below table_height + 0.02m) or if the object falls below the table surface (table_height - 0.02m). The function assigns sparse rewards: +5.0 for success and -1.25 for failure.Due to limited VRAM (8192MiB total), I'm trained it with 10,000,000 total_timesteps, 64 num_envs, 2 num_eval_envs, 100 num_steps, 4 update_epochs, and no/None reconfiguration and eval_reconfiguration frequency. Having a reconfiguration frequency significantly increased my VRAM usage, because every couple of reconfigurations I would see a spike in 500-1000 MiB increase in GPU memory being used. I'm not sure if this part is a memory leak as per #467.
I'm mainly concerned about my reward structure, particularly in
compute_dense_reward
in myycb_env.py
file, as well as my policy architecture for RGBD + state processing in my NatureCNN__init__
function in myppo_rgbd.py
file. I'm still learning atm, so any feedback would be very appreciated!Thank you!
The text was updated successfully, but these errors were encountered: