Initial commit.

CarlHuangNuc · Mar 27, 2023 · 49c0378 · 49c0378
1 parent c84a405
commit 49c0378
Show file tree

Hide file tree

Showing 283 changed files with 38,225 additions and 1 deletion.
diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md
@@ -0,0 +1,68 @@
+## Getting Started with ODISE
+
+This document provides a brief intro of the usage of ODISE.
+
+Please see [Getting Started with Detectron2](https://github.com/facebookresearch/detectron2/blob/master/GETTING_STARTED.md) for full usage.
+
+The [Stable Diffusion v1.3 checkpoint](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP checkpoint](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt) will be automatically downloaded to `~/.torch/` and `~/.cache/clip` respectively.
+Users should follow the license of the official releases of Stable Diffusion and CLIP.
+
+### Inference Demo with Pre-trained Models
+
+1. Pick a model and its config file from
+  [model zoo](README.md#model-zoo),
+  for example, `configs/Panoptic/odise_label_coco_50e.py`.
+2. We provide `demo/demo.py` that is able to demo builtin configs. Run it with:
+```
+python demo/demo.py --config-file configs/Panoptic/odise_label_coco_50e.py \
+  --input input1.jpg input2.jpg \
+  --init-from /path/to/checkpoint_file
+  [--other-options]
+```
+This command will run the inference and show visualizations in an OpenCV window.
+
+For details of the command line arguments, see `demo.py -h` or look at its source code
+to understand its behavior. Some common arguments are:
+* To run __on your webcam__, replace `--input files` with `--webcam`.
+* To run __on a video__, replace `--input files` with `--video-input video.mp4`.
+* To run __on cpu__, add `train.device=cpu` at end.
+* To save outputs to a directory (for images) or a file (for webcam or video), use `--output`.
+
+
+### Training & Evaluation in Command Line
+
+We provide a script `tools/train_net.py`, that is made to train all the configs provided in ODISE.
+
+To train a model with "train_net.py", first
+setup the COCO datasets following
+[datasets/README.md](./datasets/README.md#expected-dataset-structure-for-coco),
+then run for single node AMP training:
+```bash
+(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 8 --amp 
+```
+For multi-node AMP training: 
+```bash
+(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 0 --num-machines 2 --dist-url tcp://node_addr:29500 --num-gpus 8 --amp
+(node1)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 1 --num-machines 2 --dist-url tcp://node_addr:29500 --num-gpus 8 --amp
+```
+
+The configs are made for 16-GPU training.
+Since we use ADAMW optimizer, it is not clear how to scale learning rate with batch size.
+But we provide automatic scaling of learning rate and batch size by passing `--ref $REFERENCE_WORLD_SIZE`. 
+For example, if you set `$REFERENCE_WORLD_SIZE=16` and running on 8 GPUs, the batch size and learning rate will be halved (8/16 = 0.5).
+
+```bash
+(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 8 --amp --ref 16
+```
+
+To evaluate a model's performance, run on single node
+```
+(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 8 --eval-only --init-from /path/to/checkpoint
+```
+or for multi-node inference:
+```bash
+(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 0 --num-machines 2 --dist-url tcp://node0_addr:29500 --num-gpus 8 --eval-only --init-from /path/to/checkpoint
+(node1)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 1 --num-machines 2 --dist-url tcp://node0_addr:29500 --num-gpus 8 --eval-only --init-from /path/to/checkpoint
+```
+
+To use the our `odise://` model zoo, you may pass in `--config-file configs/Panoptic/odise_label_coco_50e.py --init-from odise://Panoptic/odise_label_coco_50e` or `--config-file configs/Panoptic/odise_label_coco_50e.py --init-from odise://Panoptic/odise_caption_coco_50e` to `./tools/train_net.py` respectively.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,84 @@
+Copyright (c) 2022-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+NVIDIA Source Code License for ODISE: Open-Vocabulary Panoptic 
+Segmentation with Text-to-Image Diffusion Models
+
+=======================================================================
+
+1. Definitions
+
+“Licensor” means any person or entity that distributes its Work.
+
+“Work” means (a) the original work of authorship made available under 
+this license, which may include software, documentation, or other files, 
+and (b) any additions to or derivative works thereof that are made 
+available under this license.
+
+The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” 
+have the meaning as provided under U.S. copyright law; provided, however, 
+that for the purposes of this license, derivative works shall not include works 
+that remain separable from, or merely link (or bind by name) to the 
+interfaces of, the Work.
+
+Works are “made available” under this license by including in or with the Work 
+either (a) a copyright notice referencing the applicability of 
+this license to the Work, or (b) a copy of this license.
+
+2. License Grant
+
+2.1 Copyright Grant. Subject to the terms and conditions of this license, each 
+Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, 
+copyright license to use, reproduce, prepare derivative works of, publicly display, 
+publicly perform, sublicense and distribute its Work and any resulting derivative 
+works in any form.
+
+3. Limitations
+
+3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under 
+this license, (b) you include a complete copy of this license with your distribution, 
+and (c) you retain without modification any copyright, patent, trademark, or 
+attribution notices that are present in the Work.
+
+3.2 Derivative Works. You may specify that additional or different terms apply to the use, 
+reproduction, and distribution of your derivative works of the Work (“Your Terms”) only 
+if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative 
+works, and (b) you identify the specific derivative works that are subject to Your Terms. 
+Notwithstanding Your Terms, this license (including the redistribution requirements in 
+Section 3.1) will continue to apply to the Work itself.
+
+3.3 Use Limitation. The Work and any derivative works thereof only may be used or 
+intended for use non-commercially. Notwithstanding the foregoing, NVIDIA Corporation 
+and its affiliates may use the Work and any derivative works commercially. 
+As used herein, “non-commercially” means for research or evaluation purposes only.
+
+3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor 
+(including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that 
+you allege are infringed by any Work, then your rights under this license from 
+such Licensor (including the grant in Section 2.1) will terminate immediately.
+
+3.5 Trademarks. This license does not grant any rights to use any Licensor’s or its 
+affiliates’ names, logos, or trademarks, except as necessary to reproduce 
+the notices described in this license.
+
+3.6 Termination. If you violate any term of this license, then your rights under 
+this license (including the grant in Section 2.1) will terminate immediately.
+
+4. Disclaimer of Warranty.
+
+THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
+EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF 
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. 
+YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE. 
+
+5. Limitation of Liability.
+
+EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, 
+WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR 
+BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, 
+OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR 
+INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS 
+INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY 
+OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE 
+POSSIBILITY OF SUCH DAMAGES.
+
+=======================================================================
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1 @@
+include odise/data/datasets/openseg_labels/*.txt
diff --git a/README.md b/README.md
@@ -1 +1,180 @@
-# ODISE
+# ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
+
+**ODISE**: **O**pen-vocabulary **DI**ffusion-based panoptic **SE**gmentation exploits pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation.
+It leverages the frozen representation of both these models to perform panoptic segmentation of any category in the wild. 
+
+This repository is the official implementation of ODISE introduced in the paper:
+
+[**Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models**](https://arxiv.org/abs/2303.04803)
+[*Jiarui Xu*](https://jerryxu.net),
+[*Sifei Liu**](https://research.nvidia.com/person/sifei-liu),
+[*Arash Vahdat**](http://latentspace.cc/),
+[*Wonmin Byeon*](https://wonmin-byeon.github.io/),
+[*Xiaolong Wang*](https://xiaolonw.github.io/),
+[*Shalini De Mello*](https://research.nvidia.com/person/shalini-de-mello)
+CVPR 2023 Highlight. (*equal contribution)
+
+![teaser](figs/github_arch.gif)
+
+## Visual Results
+
+<div align="center">
+<img src="figs/github_vis_coco_0.gif" width="32%">
+<img src="figs/github_vis_ade_0.gif" width="32%">
+<img src="figs/github_vis_ego4d_0.gif" width="32%">
+</div>
+<div align="center">
+<img src="figs/github_vis_coco_1.gif" width="32%">
+<img src="figs/github_vis_ade_1.gif" width="32%">
+<img src="figs/github_vis_ego4d_1.gif" width="32%">
+</div>
+
+
+## Links
+* [Jiarui Xu's Project Page](https://jerryxu.net/ODISE/) (with additional visual results)
+* [HuggingFace 🤗 Demo](https://huggingface.co/spaces/xvjiarui/ODISE)
+* [arXiv Page](https://arxiv.org/abs/2303.04803)
+
+## Citation
+
+If you find our work useful in your research, please cite:
+
+```BiBTeX
+@article{xu2022odise,
+  author    = {Xu, Jiarui and Liu, Sifei and Vahdat, Arash and Byeon, Wonmin and Wang, Xiaolong and De Mello, Shalini},
+  title     = {{ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models}},
+  journal   = {arXiv preprint arXiv: 2303.04803},
+  year      = {2023},
+}
+```
+
+## Environment Setup
+
+Install dependencies by running:
+
+```bash
+conda create -n odise python=3.9
+conda activate odise
+conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
+conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
+git clone [email protected]:NVlabs/ODISE.git 
+cd ODISE
+pip install -e .
+```
+
+(Optional) install [xformers](https://github.com/facebookresearch/xformers) for efficient transformer implementation:
+One could either install the pre-built version
+
+```
+pip install xformers==0.0.16
+```
+
+or build from latest source 
+
+```bash
+# (Optional) Makes the build much faster
+pip install ninja
+# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
+pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
+# (this can take dozens of minutes)
+```
+
+## Model Zoo
+
+We provide two pre-trained models for ODISE trained with label or caption 
+supervision on [COCO's](https://cocodataset.org/#home) entire training set.
+ODISE's pre-trained models are subject to the [Creative Commons — Attribution-NonCommercial-ShareAlike 4.0 International — CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode) terms.
+Each model contains 28.1M trainable parameters.
+The download links for these models are provided in the table below.
+When you run the `demo/demo.py` script for the very first time, it will also automatically download ODISE's pre-trained model to your local folder `$HOME/.torch/iopath_cache/NVlabs/ODISE/releases/download/v1.0.0/`.
+
+<table>
+<thead>
+  <tr>
+    <th align="center"></th>
+    <th align="center" style="text-align:center" colspan="3">ADE20K(A-150)</th>
+    <th align="center" style="text-align:center" colspan="3">COCO</th>
+    <th align="center" style="text-align:center">ADE20K-Full <br> (A-847)</th>
+    <th align="center" style="text-align:center">Pascal Context 59 <br> (PC-59)</th>
+    <th align="center" style="text-align:center">Pascal Context 459 <br> (PC-459)</th>
+    <th align="center" style="text-align:center">Pascal VOC 21 <br> (PAS-21) </th>
+    <th align="center" style="text-align:center">download </th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td align="center"></td>
+    <td align="center">PQ</td>
+    <td align="center">mAP</td>
+    <td align="center">mIoU</td>
+    <td align="center">PQ</td>
+    <td align="center">mAP</td>
+    <td align="center">mIoU</td>
+    <td align="center">mIoU</td>
+    <td align="center">mIoU</td>
+    <td align="center">mIoU</td>
+    <td align="center">mIoU</td>
+  </tr>
+  <tr>
+    <td align="center"><a href="configs/Panoptic/odise_label_coco_50e.py"> ODISE (label) </a></td>
+    <td align="center">22.6</td>
+    <td align="center">14.4</td>
+    <td align="center">29.9</td>
+    <td align="center">55.4</td>
+    <td align="center">46.0</td>
+    <td align="center">65.2</td>
+    <td align="center">11.1</td>
+    <td align="center">57.3</td>
+    <td align="center">14.5</td>
+    <td align="center">84.6</td>
+    <td align="center"><a href="https://github.com/NVlabs/ODISE/releases/download/v1.0.0/odise_label_coco_50e-b67d2efc.pth"> checkpoint </a></td>
+  </tr>
+  <tr>
+    <td align="center"><a href="configs/Panoptic/odise_caption_coco_50e.py"> ODISE (caption) </a></td>
+    <td align="center">23.4</td>
+    <td align="center">13.9</td>
+    <td align="center">28.7</td>
+    <td align="center">45.6</td>
+    <td align="center">38.4</td>
+    <td align="center">52.4</td>
+    <td align="center">11.0</td>
+    <td align="center">55.3</td>
+    <td align="center">13.8</td>
+    <td align="center">82.7</td>
+    <td align="center"><a href="https://github.com/NVlabs/ODISE/releases/download/v1.0.0/odise_caption_coco_50e-853cc971.pth"> checkpoint </a></td>
+  </tr>
+</tbody>
+</table>
+
+## Get Started
+See [Preparing Datasets for ODISE](datasets/README.md).
+
+See [Getting Started with ODISE](GETTING_STARTED.md) for detailed instuctions on training and inference with ODISE.
+## Demo
+
+* Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the web demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/xvjiarui/ODISE)
+
+* Run the demo on Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVlabs/ODISE/blob/master/demo/demo.ipynb)
+
+
+**Important Note**: ODISE links to the original pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt). When you run the `demo/demo.py` script for the very first time, besides ODISE's pre-trained models, it will also automaticlaly download the pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt), from their original sources, to your local directories `$HOME/.torch/` and `$HOME/.cache/clip`, respectively.
+These pre-trained models are subject to their authors' original licencse terms at [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [CLIP](https://github.com/openai/CLIP), respectively.
+
+* To run ODISE's demo from the command line:
+
+    ```shell
+    python demo/demo.py --input demo/examples/coco.jpg --output demo/coco_pred.jpg --vocab "black pickup truck, pickup truck; blue sky, sky"
+    ```
+    The output is saved in `demo/coco_pred.jpg`. For more detailed options for `demo/demo.py` see [Getting Started with ODISE](GETTING_STARTED.md).
+
+
+* To run the Gradio demo locally:
+    ```shell
+    python demo/app.py
+    ```
+
+## Acknowledgement
+
+Code is largely based on [Detectron2](https://github.com/facebookresearch/detectron2), [Stable Diffusion](https://github.com/CompVis/stable-diffusion), [Mask2Former](https://github.com/facebookresearch/Mask2Former), [OpenCLIP](https://github.com/mlfoundations/open_clip) and [GLIDE](https://github.com/openai/glide-text2im).
+
+Thank you, all, for the great open-source projects!
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		include odise/data/datasets/openseg_labels/*.txt