Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D reconstruction from multiple images - scale mismatch in depth and pose #213

Open
alexander-born opened this issue Feb 14, 2022 · 5 comments

Comments

@alexander-born
Copy link

alexander-born commented Feb 14, 2022

Thanks for this great repository!

I tried to modify the camviz demo to not only create a pointcloud from a single image but from multiple images.

First I downloaded the pretrained KITTI checkpoints (PackNet01_HR_velsup_CStoK.ckpt).

I modified the ./scripts/infer.py to also calculate the pose like this:

# pose inference
pose = model_wrapper.pose(image, [image_tminus1, image_tplus1]) 

and saved these relative poses from (t -> t_minus_1) additionally to the depth in a npz file.

To create a Pose object from the pose net output I created a transformation matrix with:

from scipy.spatial.transform import Rotation as R
def pose_vec2mat(posenet_output): #posenet_output is the pose from t -> t_minus_1 
    """Convert Euler parameters to transformation matrix."""
    trans, rot = posenet_output[:3], posenet_output[3:]
    rot_mat = R.from_euler("zyx", rot).as_matrix()
    mat = np.concatenate((rot_mat, trans[:, np.newaxis]), axis=1)  # [B,3,4]
    padding = np.array([0, 0, 0, 1])
    mat = np.concatenate((mat, padding[np.newaxis, :]), axis=0)
    return mat

I accumulated these poses with __matmul__ to get all the camera poses. Is the pose calculation correct (it looks good when visualized)?

Then I used Camera.i2w()
to project the point clouds of multiple images to world coordinates (Additionally I filtered the point cloud by a max. distance threshold).

It seems like there is a scale mismatch in the output of the depth network and the pose network. This can be seen in the screenshot below, where I am visualizing pointclouds from multiple images (KITTI Tiny). The coordinate systems in the screenshot are the camera poses. You can see the scale mismatch in the duplicated vehicle and that the camera poses moved way too much compared to the pointcloud. Shouldn't these pointclouds of multiple images fit together really good, when using the pose and depth net which are trained together?
image

Only if I either scale depth or poses the resulting pointclouds are overlapping (not perfectly): (scaled pose with factor 0.14, no scaling factor in depth):
image
Another kitti example (scaled pose with factor 0.14, no scaling factor in depth):
image

@hhhharold
Copy link

Have you tried to 3D reconstruction from multi-view images? Just like the image shown in the README of the camviz

@alexander-born
Copy link
Author

alexander-born commented Feb 17, 2022

Yes this is a 3D reconstruction from multiple views in a global frame. I generated one joint 3d point cloud from multiple images and projectem them into global frame via the relative poses between the images.

In the camviz readme gifs something else is done. There only the point clouds from single image are shown one after the other in the camera frame and not in a global frame (only using the output of the depth net, pose net outputs are not used there). That's how I understood it. Please correct me if this is wrong.

The problem I am facing is that the scale of the pose net output does not match the scale of the depth net. (projections in global frame are not overlapping)

@hhhharold
Copy link

I used multi-views images from DDAD dataset to reconstruct 3D scene in world coordinates, but the result was not well. I am checking whether there is an error in my code or the extrinsics is inaccurate.

@VitorGuizilini-TRI
Copy link
Collaborator

Can you try using ground-truth information, just to check whether the transformations are correct?

@hhhharold
Copy link

hhhharold commented Feb 18, 2022

Can you try using ground-truth information, just to check whether the transformations are correct?

Yes, there was some mistakes in my code and the extrinsics is correct. By the way, how to set the parameter of pose in draw.add3Dworld function to get a better original visual angle in world coordinates. The default setting in demo is draw.add3Dworld('wld', luwh=(0.33, 0.00, 1.00, 1.00), pose=(7.25323, -3.80291, -5.89996, 0.98435, 0.07935, 0.15674, 0.01431)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants