FoundationStereo: Zero-Shot Stereo Matching

This is the official implementation of our paper accepted by CVPR 2025 (All strong accept)

Authors: Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, Stan Birchfield

Abstract

Tremendous progress has been made in deep stereo matching to excel on benchmark datasets through per-domain fine-tuning. However, achieving strong zero-shot generalization — a hallmark of foundation models in other computer vision tasks — remains challenging for stereo matching. We introduce FoundationStereo, a foundation model for stereo depth estimation designed to achieve strong zero-shot generalization. To this end, we first construct a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism, followed by an automatic self-curation pipeline to remove ambiguous samples. We then design a number of network architecture components to enhance scalability, including a side-tuning feature backbone that adapts rich monocular priors from vision foundation models to mitigate the sim-to-real gap, and long-range context reasoning for effective cost volume filtering. Together, these components lead to strong robustness and accuracy across domains, establishing a new standard in zero-shot stereo depth estimation.

TLDR: Our method takes as input a pair of stereo images and outputs a dense disparity map, which can be converted to a metric-scale depth map or 3D point cloud.

Leaderboards 🏆

We obtained the 1st place on the world-wide Middlebury leaderboard and ETH3D leaderboard.

Comparison with Monocular Depth Estimation

Our method outperforms existing approaches in zero-shot stereo matching tasks across different scenes.

Installation

conda env create -f environment.yml
conda activate foundation_stereo

Model Weights

Download the foundation model for zero-shot inference on your data from here. Put the entire folder (e.g. 23-51-11) under ./pretrained_models/.

Run demo

python scripts/run_demo.py --left_file ./assets/left.png --right_file ./assets/right.png --ckpt_dir ./checkpoints/model_best_bp2.pth --out_dir ./test_outputs/

Tips:

The input left and right images should be rectified and undistorted, which means there should not be fisheye kind of lens distortion and the epipolar lines are horizontal between the left/right images. If you obtain images from stereo cameras such as Zed, they usually have handled this for you.
We recommend to use PNG files with no loseless compression
Our method works best on stereo RGB images. However, we have also tested it on gray scale images or IR images and it works well too.
For all options and instructions, check by python scripts/run_demo.py --help
For high-resolution image (>1000px), you can run with --hiera 1 to enable hierarchical inference for better performance.
For faster inference, you can reduce the input image resolution by e.g. --scale 0.5, and reduce refine iterations by e.g. --valid_iters 16.

BibTeX

@article{wen2025stereo,
  title={FoundationStereo: Zero-Shot Stereo Matching},
  author={Bowen Wen and Matthew Trepte and Joseph Aribido and Jan Kautz and Orazio Gallo and Stan Birchfield},
  journal={arXiv},
  year={2025}
}

Acknowledgement

We would like to thank Gordon Grigor, Jack Zhang, Karsten Patzwaldt, Hammad Mazhar and other NVIDIA Isaac team members for their tremendous engineering support and valuable discussions. Finally we would also like to thank CVPR reviewers and AC for their appreciation of this work and constructive feedback.

Contact

For questions, please reach out to Bowen Wen ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
core		core
depth_anything		depth_anything
dinov2		dinov2
docker		docker
scripts		scripts
teaser		teaser
.gitignore		.gitignore
LICENSE		LICENSE
Utils.py		Utils.py
environment.yml		environment.yml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FoundationStereo: Zero-Shot Stereo Matching

Abstract

Leaderboards 🏆

Comparison with Monocular Depth Estimation

Installation

Model Weights

Run demo

BibTeX

Acknowledgement

Contact

About

Releases

Packages

Languages

License

NVlabs/FoundationStereo

Folders and files

Latest commit

History

Repository files navigation

FoundationStereo: Zero-Shot Stereo Matching

Abstract

Leaderboards 🏆

Comparison with Monocular Depth Estimation

Installation

Model Weights

Run demo

BibTeX

Acknowledgement

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages