You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work. I tried to test it on the RedWood bedroom dataset (http://redwood-data.org/indoor_lidar_rgbd/index.html) with downsampled RGB-D images (from 21930 to 219 frames, resolution 640x480), both original and downsampled pointcloud (~5M, 100k points) but cannot get reasonable outputs. After filtering it seems that only the first frame result is remained as I checked the camera pose by reprojecting the first frame depth into the scene scan point cloud. It says originally with 580 prompts in 3d proposal stage and 51 remains after 2d-guided filter. Then 15 after prompt consolidation.
There is one point I don't know whether I got it correct: in utils/main_utils.py:transform_pt_depth_scannet_torch(), it requires bx and by from camera intrinsic matrix. I don't know what they mean and set them to 0s.
Could you provide any insights on refining the results? e.g. lower image resolution for SAM, change filter parameters, etc.
First frame rgb image:
Final segmented point cloud, the floor is segmented well, but for other parts seem only around the first frame viewpoint:
The text was updated successfully, but these errors were encountered:
Try to input more points (not just 100k, may be 1000k or even more), since providing adequate initial prompts is very important to generate adequate confident masks for later filter and consolidation.
Try to use more frames (10% of original frames).
If they do not work:
Since I assume that you have successfully run 3D_prompt_proposal.py, then your problem may be due to the issue of main.py. So I recommend you to visualize the 2D results after prompt filter and consolidation, to see what happened here.
Also, you need to check the camera pose to make sure that the masked area of each frame is correctly projected onto 3D space.
Hi,
Thanks for your great work. I tried to test it on the RedWood bedroom dataset (http://redwood-data.org/indoor_lidar_rgbd/index.html) with downsampled RGB-D images (from 21930 to 219 frames, resolution 640x480), both original and downsampled pointcloud (~5M, 100k points) but cannot get reasonable outputs. After filtering it seems that only the first frame result is remained as I checked the camera pose by reprojecting the first frame depth into the scene scan point cloud. It says originally with 580 prompts in 3d proposal stage and 51 remains after 2d-guided filter. Then 15 after prompt consolidation.
There is one point I don't know whether I got it correct: in utils/main_utils.py:transform_pt_depth_scannet_torch(), it requires bx and by from camera intrinsic matrix. I don't know what they mean and set them to 0s.
Could you provide any insights on refining the results? e.g. lower image resolution for SAM, change filter parameters, etc.
First frame rgb image:
Final segmented point cloud, the floor is segmented well, but for other parts seem only around the first frame viewpoint:
The text was updated successfully, but these errors were encountered: