Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the .ply file from the generalizable model #42

Open
ZhenyuSun-Walker opened this issue Aug 27, 2024 · 15 comments
Open

About the .ply file from the generalizable model #42

ZhenyuSun-Walker opened this issue Aug 27, 2024 · 15 comments

Comments

@ZhenyuSun-Walker
Copy link

Hello, Sir! I noticed that when I apply the generalizable methods on my own pictures, the generated point cloud is quite planar, why does the depth estimator works badly on my own dataset?

@TQTQliu
Copy link
Owner

TQTQliu commented Aug 27, 2024

Hi, can you provide more information, like some visualizations

@ZhenyuSun-Walker
Copy link
Author

Sure, I'll send you the aweful result.
image
image

@TQTQliu
Copy link
Owner

TQTQliu commented Aug 27, 2024

Can you also provide the input image?

@ZhenyuSun-Walker
Copy link
Author

And my input is the dataset of multiview images. Shown as below is the part of my input dataset
90
95
100
And for total there are 20 images.

@TQTQliu
Copy link
Owner

TQTQliu commented Aug 27, 2024

Thanks for your information. Did you run the following command to get novel views? If so, is the novel view equally bad?

python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root <your scene>

@ZhenyuSun-Walker
Copy link
Author

Actually, I did generate the novel view, and the metric of the novel view is excellent, as well as the visual quality when I observe the novel view images.

@TQTQliu
Copy link
Owner

TQTQliu commented Aug 27, 2024

If you can get a good novel view, the corresponding estimated depth should also be good.
You can check the depth map in the folder <path to save ply>/<your scene>/depth.
As for point clouds, you can zoom in to view it. Since this is an indoor scene, a large flat area below may be the floor. You need to zoom in to get a closer look at the point cloud.
If there is still a problem, then it may be a problem with filtering hyperparameters. Adjust here.

@ZhenyuSun-Walker
Copy link
Author

The depth map of the example/scene2 is like:
image
The depth map of mine is like:
image

Can you explain the detail information about the filtering hyperparameters? Their meaning and the their impact on final quality when changing them

@ZhenyuSun-Walker
Copy link
Author

And I would like to verify an assumption of your pipeline.
image
So in your work flow, you combine the feature and get f_v, so actually after the generalizable model, there are only one target-view image rendered from lots of source-view images.
In this case, I wonder if the generated point_cloud by command python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root examples/scene1 save_ply True dir_ply <path to save ply> is only from the last single target view image, or it is a combined point cloud with the 4 target view images? However, even if the pointcloud only comes from a single target-view rendered image, with the depth estimation of the taret-view in you pipeline, it should succeed in unprojecting the image to get the pointcloud, just like its performance on eamples/scene2.

Anticipating for your earliest reply!

@TQTQliu
Copy link
Owner

TQTQliu commented Aug 27, 2024

  1. The depth map you're showing doesn't seem particularly good, you can modify the number of sampling points volume_planes in the config file, commonly used settings are [64,8], [48,8] and [16,8].
  2. The process of point cloud fusion is to filter out some unreliable depths by checking the consistency of multiple views. For the Filter hyperparameter Settings:
s = 1
dist_base = 1/8
rel_diff_base = 1/10

Refer here, dist_base and rel_diff_base are the thresholds for reprojection errors. If the reprojection error is less than the threshold, the depth is reliable. A larger threshold value indicates a more relaxed condition, and a smaller value indicates a more stringent condition. 's' means that it is reliable when at least s multiple views meet the conditions (It's not very precise, but it can be understood this way). A larger s means a stricter condition, and a smaller s means a looser condition.

It is possible that the current hyperparameter Settings are too strict for the scenario you are using, filtering out too many points, resulting in poor point cloud effect. You can make adjustments according to the meaning of the above hyperparameters. One extreme setting is

s = 1
dist_base = 100
rel_diff_base = 100

In this case, almost all of the points are considered reliable, i.e. no points are filtered out, and you can try it.

  1. The generated point cloud is a combined point cloud with the 4 target-view images. The 4 point clouds corresponding to the 4 target views are fused into the final point cloud by the fusion.py.

@ZhenyuSun-Walker
Copy link
Author

Thank you,I'll have a check ASAP! BTW, would you mind explaining what is the meaning of the volume-planes configuration, like [64, 8], [48, 8]?

@TQTQliu
Copy link
Owner

TQTQliu commented Aug 27, 2024

You're welcome. We use a cascaded (two-stage) structure and the plane-sweeping algorithm for depth estimation.
As shown in the figure below, given the near and far (far-near=R1) of the scene, we first define N1 depth planes (such as equal interval sampling), i.e., the pink lines. In the coarse stage (stage 1), based on these predefined depth hypothesis planes, we can get a coarse depth, i.e., the yellow line.
In the fine stage (stage 2), we will further sample around the coarse depth obtained in the previous stage to obtain N2 depth planes. Based on this N2 depth hypothesis planes, we can predict a finer depth.

The volume-planes are actually [N1, N2], which represents the number of depth samples in the two stages.

image

@ZhenyuSun-Walker
Copy link
Author

OK, that makes sense! And right now I finished the experiments with changing the dist_base and rel_diff_base into 100 respectively, however, the point cloud and the gaussian view are not good enough, shown as below.

(Top-down view)
image

(Front view)
image

77bd0f977419df4a4b0746ac78c9ad9

And I sincerely hope that volume-plane configuration adjustment can work!

@ZhenyuSun-Walker
Copy link
Author

Sir, I find that the volume-plane configuration with [64, 8] and d r = 100 respectively, but the result is no good as I thought. Pretty tricky and strange!

@zhangshuoneu
Copy link

Based on your images, I think that your photo is captured by panorama, which may not fit the camera model of pin hole used by paper. You can try pin hole image again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants