This is the code for the paper
Unsupervised Learning of Depth Estimation and Visual Odometry for Sparse Light Field Cameras,
S. Tejaswi Digumarti, Joseph Daniel, Ahalya Ravendran, Ryan Griffiths, and Donald G. Dansereau,
IROS 2021.
Authors: Tejaswi Digumarti, Joseph Daniel
Other Contributors: Ahalya Ravendran, Ryan Griffiths, Donald G. Dansereau
Maintainer: Tejaswi Digumarti
For further information please see the Project Website.
These are required for running the training and inference scripts.
- pytorch
- numpy
- cv2
- blessings - for handy printing in the terminal
- progressbar2 - for a progressbar in the terminal
And the dependencies of these libraries.
Optional Dependencies to use the other tools in the repository.
- matplotlib
- tensorboard
- imageio - Used by DEPRECATED code and not necessary.
- evo This is provided as a submodule. After cloning the repository do the following in the folder of the repository
git submodule update --init --recursive
Note: If using pycharm as your IDE add external/evo to sources.
An anaconda enviornment for training can be setup using the following commands. Please follow the sequence, otherwise there may be some inter-dependent library conflicts.
conda create -n epienv python=3.7
conda activate epienv
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
conda install -c conda-forge blessings progressbar2
conda install -c conda-forge opencv
conda install -c conda-forge matplotlib
conda install -c conda-forge tensorboard
The python file to run for training is the train_multiwarp.py
.
Example training scripts with configuration parameters are present in
the training_scripts
folder.
The training process consists of the following steps.
-
Parse the input arguments to determine the path to the dataset, save folder, lightfield format to use for training and a few other configuration parameters. These functions are defined in
parser.py
. A full list of arguments can be found here. -
Data loaders (from
epimodule.py
) specific to the lightfield format are then initialized. The lightfield formats are the following.-
focalstack
: A focal stack image formed by layering images from all the cameras of the multi-aperture camera and computing the average intensity at each pixel. The plane of focus is determined by the amount by which each image is shifted (vertically or horizontally), before adding it to the image from the central camera. The functionload_multiplane_focalstack
does this. -
stack
: This is nothing but a concatenation of all the images from all the cameras, to get a 3*N channel (colour) or N channel (grayscale) image where N is the number of cameras. The functionload_stacked_epi
does this. -
epi
: An epipolar plane image (EPI) is formed by taking horizontal or vertical slices of the images from the multi-array camera and concatenating these slices together. If vertical slices are taken and stacked horizontally, it will result in an image of size (height x N*width) and if horizontal slices are taken and stacked vertically, it will result in an image of size (N*height x width). The functionload_tiled_epi
does this.
-
-
Prepare pytorch dataloaders for training and validation using the above dataloaders.
-
If the lightfield format is
epi
, then encodersRelativeEpiEncoder
andEpiEncoder
are loaded to encode the lightfield image into an image that forms the input to the Pose estimation network and the Disparity network respectively. -
Load the Disparity and Pose estimation networks with pre-trained weights if available.
-
Train and validate over the specified number of epochs.
Use the script infer_multiwarp.py to perform inference. Examples using this script with parameters set are in the validation_scripts folder.
To infer just depth and not pose (e.g. for a single input image), then use infer_depth.
A detailed description of all the files can be found here.