Yuxin Hou · Juho Kannala · Arno Solin
Codes for the paper:
- Yuxin Hou, Arno Solin, and Juho Kannala (2019). Multi-view stereo by temporal nonparametric fusion. International Conference on Computer Vision (ICCV). Seoul, Korea. [arXiv] [video] [project page]
We propose a novel idea for depth estimation from unstructured multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. This model uses pairs of images and poses, which are passed through an encoder-decoder model for disparity estimation. The novelty lies in soft-constraining the bottleneck layer by a nonparametric Gaussian process prior.
Example depth estimation result running in real-time on an iPad.
- Python3
- Numpy
- Pytorch 0.4.0
- CUDA 9 (You can also run without CUDA, but then you need to remove all
.cuda()
in codes) - opencv
- tensorboardX
- imageio
- path.py
- blessings
- progressbar2
As we mentioned in our paper, the training use the split pretrained MVDepthNet model as statring point. Check the link to get the pretrained model.
python train.py train_dataset_path --pretrained-dict pretrained_mvdepthnet --log-output
For testing run
python test.py formatted_seq_path --savepath disparity.npy --encoder encoder_path --gp gp_path --decoder decoder_path
Our pretrained model can be downloaded here.
The formatted sequence have the folder structure like this:
K.txt
: The txt file stores the camera intrinsic matrixposes.txt
: The text file stores extrinsic matrixs for all frames in the sequence in order.images
: The folder includes all RGB images(.png), and the images are ordered by name.depth
: The folder includes all ground truth depth map(.npy), and the name is matched with the images'name.
We also provide one example sequence: redkitchen seq-01-formatted.
The encoder/decoder codes build on MVDepthNet. Some useful util functions used during training are from SfmLearner. Most of the training data are collected by DeMoN. We appreciate their work!
Copyright Yuxin Hou, Juho Kannala, and Arno Solin.
This software is provided under the MIT License. See the accompanying LICENSE file for details.