Skip to content

Commit

Permalink
Initial commit.
Browse files Browse the repository at this point in the history
  • Loading branch information
shalinidemello committed Mar 27, 2023
1 parent c84a405 commit 49c0378
Show file tree
Hide file tree
Showing 283 changed files with 38,225 additions and 1 deletion.
68 changes: 68 additions & 0 deletions GETTING_STARTED.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
## Getting Started with ODISE

This document provides a brief intro of the usage of ODISE.

Please see [Getting Started with Detectron2](https://github.com/facebookresearch/detectron2/blob/master/GETTING_STARTED.md) for full usage.

The [Stable Diffusion v1.3 checkpoint](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP checkpoint](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt) will be automatically downloaded to `~/.torch/` and `~/.cache/clip` respectively.
Users should follow the license of the official releases of Stable Diffusion and CLIP.

### Inference Demo with Pre-trained Models

1. Pick a model and its config file from
[model zoo](README.md#model-zoo),
for example, `configs/Panoptic/odise_label_coco_50e.py`.
2. We provide `demo/demo.py` that is able to demo builtin configs. Run it with:
```
python demo/demo.py --config-file configs/Panoptic/odise_label_coco_50e.py \
--input input1.jpg input2.jpg \
--init-from /path/to/checkpoint_file
[--other-options]
```
This command will run the inference and show visualizations in an OpenCV window.

For details of the command line arguments, see `demo.py -h` or look at its source code
to understand its behavior. Some common arguments are:
* To run __on your webcam__, replace `--input files` with `--webcam`.
* To run __on a video__, replace `--input files` with `--video-input video.mp4`.
* To run __on cpu__, add `train.device=cpu` at end.
* To save outputs to a directory (for images) or a file (for webcam or video), use `--output`.


### Training & Evaluation in Command Line

We provide a script `tools/train_net.py`, that is made to train all the configs provided in ODISE.

To train a model with "train_net.py", first
setup the COCO datasets following
[datasets/README.md](./datasets/README.md#expected-dataset-structure-for-coco),
then run for single node AMP training:
```bash
(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 8 --amp
```
For multi-node AMP training:
```bash
(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 0 --num-machines 2 --dist-url tcp://node_addr:29500 --num-gpus 8 --amp
(node1)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 1 --num-machines 2 --dist-url tcp://node_addr:29500 --num-gpus 8 --amp
```

The configs are made for 16-GPU training.
Since we use ADAMW optimizer, it is not clear how to scale learning rate with batch size.
But we provide automatic scaling of learning rate and batch size by passing `--ref $REFERENCE_WORLD_SIZE`.
For example, if you set `$REFERENCE_WORLD_SIZE=16` and running on 8 GPUs, the batch size and learning rate will be halved (8/16 = 0.5).

```bash
(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 8 --amp --ref 16
```

To evaluate a model's performance, run on single node
```
(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 8 --eval-only --init-from /path/to/checkpoint
```
or for multi-node inference:
```bash
(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 0 --num-machines 2 --dist-url tcp://node0_addr:29500 --num-gpus 8 --eval-only --init-from /path/to/checkpoint
(node1)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 1 --num-machines 2 --dist-url tcp://node0_addr:29500 --num-gpus 8 --eval-only --init-from /path/to/checkpoint
```

To use the our `odise://` model zoo, you may pass in `--config-file configs/Panoptic/odise_label_coco_50e.py --init-from odise://Panoptic/odise_label_coco_50e` or `--config-file configs/Panoptic/odise_label_coco_50e.py --init-from odise://Panoptic/odise_caption_coco_50e` to `./tools/train_net.py` respectively.
84 changes: 84 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
Copyright (c) 2022-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

NVIDIA Source Code License for ODISE: Open-Vocabulary Panoptic
Segmentation with Text-to-Image Diffusion Models

=======================================================================

1. Definitions

“Licensor” means any person or entity that distributes its Work.

“Work” means (a) the original work of authorship made available under
this license, which may include software, documentation, or other files,
and (b) any additions to or derivative works thereof that are made
available under this license.

The terms “reproduce,” “reproduction,” “derivative works,” and “distribution”
have the meaning as provided under U.S. copyright law; provided, however,
that for the purposes of this license, derivative works shall not include works
that remain separable from, or merely link (or bind by name) to the
interfaces of, the Work.

Works are “made available” under this license by including in or with the Work
either (a) a copyright notice referencing the applicability of
this license to the Work, or (b) a copy of this license.

2. License Grant

2.1 Copyright Grant. Subject to the terms and conditions of this license, each
Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free,
copyright license to use, reproduce, prepare derivative works of, publicly display,
publicly perform, sublicense and distribute its Work and any resulting derivative
works in any form.

3. Limitations

3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under
this license, (b) you include a complete copy of this license with your distribution,
and (c) you retain without modification any copyright, patent, trademark, or
attribution notices that are present in the Work.

3.2 Derivative Works. You may specify that additional or different terms apply to the use,
reproduction, and distribution of your derivative works of the Work (“Your Terms”) only
if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative
works, and (b) you identify the specific derivative works that are subject to Your Terms.
Notwithstanding Your Terms, this license (including the redistribution requirements in
Section 3.1) will continue to apply to the Work itself.

3.3 Use Limitation. The Work and any derivative works thereof only may be used or
intended for use non-commercially. Notwithstanding the foregoing, NVIDIA Corporation
and its affiliates may use the Work and any derivative works commercially.
As used herein, “non-commercially” means for research or evaluation purposes only.

3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor
(including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that
you allege are infringed by any Work, then your rights under this license from
such Licensor (including the grant in Section 2.1) will terminate immediately.

3.5 Trademarks. This license does not grant any rights to use any Licensor’s or its
affiliates’ names, logos, or trademarks, except as necessary to reproduce
the notices described in this license.

3.6 Termination. If you violate any term of this license, then your rights under
this license (including the grant in Section 2.1) will terminate immediately.

4. Disclaimer of Warranty.

THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT.
YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE.

5. Limitation of Liability.

EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY,
WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR
BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL,
OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR
INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS
INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY
OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

=======================================================================
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include odise/data/datasets/openseg_labels/*.txt
181 changes: 180 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,180 @@
# ODISE
# ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

**ODISE**: **O**pen-vocabulary **DI**ffusion-based panoptic **SE**gmentation exploits pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation.
It leverages the frozen representation of both these models to perform panoptic segmentation of any category in the wild.

This repository is the official implementation of ODISE introduced in the paper:

[**Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models**](https://arxiv.org/abs/2303.04803)
[*Jiarui Xu*](https://jerryxu.net),
[*Sifei Liu**](https://research.nvidia.com/person/sifei-liu),
[*Arash Vahdat**](http://latentspace.cc/),
[*Wonmin Byeon*](https://wonmin-byeon.github.io/),
[*Xiaolong Wang*](https://xiaolonw.github.io/),
[*Shalini De Mello*](https://research.nvidia.com/person/shalini-de-mello)
CVPR 2023 Highlight. (*equal contribution)

![teaser](figs/github_arch.gif)

## Visual Results

<div align="center">
<img src="figs/github_vis_coco_0.gif" width="32%">
<img src="figs/github_vis_ade_0.gif" width="32%">
<img src="figs/github_vis_ego4d_0.gif" width="32%">
</div>
<div align="center">
<img src="figs/github_vis_coco_1.gif" width="32%">
<img src="figs/github_vis_ade_1.gif" width="32%">
<img src="figs/github_vis_ego4d_1.gif" width="32%">
</div>


## Links
* [Jiarui Xu's Project Page](https://jerryxu.net/ODISE/) (with additional visual results)
* [HuggingFace 🤗 Demo](https://huggingface.co/spaces/xvjiarui/ODISE)
* [arXiv Page](https://arxiv.org/abs/2303.04803)

## Citation

If you find our work useful in your research, please cite:

```BiBTeX
@article{xu2022odise,
author = {Xu, Jiarui and Liu, Sifei and Vahdat, Arash and Byeon, Wonmin and Wang, Xiaolong and De Mello, Shalini},
title = {{ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models}},
journal = {arXiv preprint arXiv: 2303.04803},
year = {2023},
}
```

## Environment Setup

Install dependencies by running:

```bash
conda create -n odise python=3.9
conda activate odise
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
git clone [email protected]:NVlabs/ODISE.git
cd ODISE
pip install -e .
```

(Optional) install [xformers](https://github.com/facebookresearch/xformers) for efficient transformer implementation:
One could either install the pre-built version

```
pip install xformers==0.0.16
```

or build from latest source

```bash
# (Optional) Makes the build much faster
pip install ninja
# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
# (this can take dozens of minutes)
```

## Model Zoo

We provide two pre-trained models for ODISE trained with label or caption
supervision on [COCO's](https://cocodataset.org/#home) entire training set.
ODISE's pre-trained models are subject to the [Creative Commons — Attribution-NonCommercial-ShareAlike 4.0 International — CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode) terms.
Each model contains 28.1M trainable parameters.
The download links for these models are provided in the table below.
When you run the `demo/demo.py` script for the very first time, it will also automatically download ODISE's pre-trained model to your local folder `$HOME/.torch/iopath_cache/NVlabs/ODISE/releases/download/v1.0.0/`.

<table>
<thead>
<tr>
<th align="center"></th>
<th align="center" style="text-align:center" colspan="3">ADE20K(A-150)</th>
<th align="center" style="text-align:center" colspan="3">COCO</th>
<th align="center" style="text-align:center">ADE20K-Full <br> (A-847)</th>
<th align="center" style="text-align:center">Pascal Context 59 <br> (PC-59)</th>
<th align="center" style="text-align:center">Pascal Context 459 <br> (PC-459)</th>
<th align="center" style="text-align:center">Pascal VOC 21 <br> (PAS-21) </th>
<th align="center" style="text-align:center">download </th>
</tr>
</thead>
<tbody>
<tr>
<td align="center"></td>
<td align="center">PQ</td>
<td align="center">mAP</td>
<td align="center">mIoU</td>
<td align="center">PQ</td>
<td align="center">mAP</td>
<td align="center">mIoU</td>
<td align="center">mIoU</td>
<td align="center">mIoU</td>
<td align="center">mIoU</td>
<td align="center">mIoU</td>
</tr>
<tr>
<td align="center"><a href="configs/Panoptic/odise_label_coco_50e.py"> ODISE (label) </a></td>
<td align="center">22.6</td>
<td align="center">14.4</td>
<td align="center">29.9</td>
<td align="center">55.4</td>
<td align="center">46.0</td>
<td align="center">65.2</td>
<td align="center">11.1</td>
<td align="center">57.3</td>
<td align="center">14.5</td>
<td align="center">84.6</td>
<td align="center"><a href="https://github.com/NVlabs/ODISE/releases/download/v1.0.0/odise_label_coco_50e-b67d2efc.pth"> checkpoint </a></td>
</tr>
<tr>
<td align="center"><a href="configs/Panoptic/odise_caption_coco_50e.py"> ODISE (caption) </a></td>
<td align="center">23.4</td>
<td align="center">13.9</td>
<td align="center">28.7</td>
<td align="center">45.6</td>
<td align="center">38.4</td>
<td align="center">52.4</td>
<td align="center">11.0</td>
<td align="center">55.3</td>
<td align="center">13.8</td>
<td align="center">82.7</td>
<td align="center"><a href="https://github.com/NVlabs/ODISE/releases/download/v1.0.0/odise_caption_coco_50e-853cc971.pth"> checkpoint </a></td>
</tr>
</tbody>
</table>

## Get Started
See [Preparing Datasets for ODISE](datasets/README.md).

See [Getting Started with ODISE](GETTING_STARTED.md) for detailed instuctions on training and inference with ODISE.
## Demo

* Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the web demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/xvjiarui/ODISE)

* Run the demo on Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVlabs/ODISE/blob/master/demo/demo.ipynb)


**Important Note**: ODISE links to the original pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt). When you run the `demo/demo.py` script for the very first time, besides ODISE's pre-trained models, it will also automaticlaly download the pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt), from their original sources, to your local directories `$HOME/.torch/` and `$HOME/.cache/clip`, respectively.
These pre-trained models are subject to their authors' original licencse terms at [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [CLIP](https://github.com/openai/CLIP), respectively.

* To run ODISE's demo from the command line:

```shell
python demo/demo.py --input demo/examples/coco.jpg --output demo/coco_pred.jpg --vocab "black pickup truck, pickup truck; blue sky, sky"
```
The output is saved in `demo/coco_pred.jpg`. For more detailed options for `demo/demo.py` see [Getting Started with ODISE](GETTING_STARTED.md).


* To run the Gradio demo locally:
```shell
python demo/app.py
```

## Acknowledgement

Code is largely based on [Detectron2](https://github.com/facebookresearch/detectron2), [Stable Diffusion](https://github.com/CompVis/stable-diffusion), [Mask2Former](https://github.com/facebookresearch/Mask2Former), [OpenCLIP](https://github.com/mlfoundations/open_clip) and [GLIDE](https://github.com/openai/glide-text2im).

Thank you, all, for the great open-source projects!
Loading

0 comments on commit 49c0378

Please sign in to comment.