Skip to content

AaltoML/GP-MVS

Repository files navigation

Code: Multi-view Stereo by Temporal Nonparametric Fusion

Yuxin Hou · Juho Kannala · Arno Solin

Codes for the paper:

  • Yuxin Hou, Arno Solin, and Juho Kannala (2019). Multi-view stereo by temporal nonparametric fusion. International Conference on Computer Vision (ICCV). Seoul, Korea. [arXiv] [video] [project page]

Summary

We propose a novel idea for depth estimation from unstructured multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. This model uses pairs of images and poses, which are passed through an encoder-decoder model for disparity estimation. The novelty lies in soft-constraining the bottleneck layer by a nonparametric Gaussian process prior.

Example depth estimation result running in real-time on an iPad.

Prerequisites

  • Python3
  • Numpy
  • Pytorch 0.4.0
  • CUDA 9 (You can also run without CUDA, but then you need to remove all .cuda() in codes)
  • opencv
  • tensorboardX
  • imageio
  • path.py
  • blessings
  • progressbar2

Training

As we mentioned in our paper, the training use the split pretrained MVDepthNet model as statring point. Check the link to get the pretrained model.

python train.py train_dataset_path --pretrained-dict pretrained_mvdepthnet --log-output

Testing

For testing run

python test.py formatted_seq_path --savepath disparity.npy --encoder encoder_path --gp gp_path --decoder decoder_path

Our pretrained model can be downloaded here.

Use your own data for testing

The formatted sequence have the folder structure like this:

  • K.txt: The txt file stores the camera intrinsic matrix
  • poses.txt: The text file stores extrinsic matrixs for all frames in the sequence in order.
  • images: The folder includes all RGB images(.png), and the images are ordered by name.
  • depth: The folder includes all ground truth depth map(.npy), and the name is matched with the images'name.

We also provide one example sequence: redkitchen seq-01-formatted.

Acknowledgements

The encoder/decoder codes build on MVDepthNet. Some useful util functions used during training are from SfmLearner. Most of the training data are collected by DeMoN. We appreciate their work!

License

Copyright Yuxin Hou, Juho Kannala, and Arno Solin.

This software is provided under the MIT License. See the accompanying LICENSE file for details.