forked from NVlabs/ODISE
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c84a405
commit 49c0378
Showing
283 changed files
with
38,225 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
## Getting Started with ODISE | ||
|
||
This document provides a brief intro of the usage of ODISE. | ||
|
||
Please see [Getting Started with Detectron2](https://github.com/facebookresearch/detectron2/blob/master/GETTING_STARTED.md) for full usage. | ||
|
||
The [Stable Diffusion v1.3 checkpoint](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP checkpoint](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt) will be automatically downloaded to `~/.torch/` and `~/.cache/clip` respectively. | ||
Users should follow the license of the official releases of Stable Diffusion and CLIP. | ||
|
||
### Inference Demo with Pre-trained Models | ||
|
||
1. Pick a model and its config file from | ||
[model zoo](README.md#model-zoo), | ||
for example, `configs/Panoptic/odise_label_coco_50e.py`. | ||
2. We provide `demo/demo.py` that is able to demo builtin configs. Run it with: | ||
``` | ||
python demo/demo.py --config-file configs/Panoptic/odise_label_coco_50e.py \ | ||
--input input1.jpg input2.jpg \ | ||
--init-from /path/to/checkpoint_file | ||
[--other-options] | ||
``` | ||
This command will run the inference and show visualizations in an OpenCV window. | ||
|
||
For details of the command line arguments, see `demo.py -h` or look at its source code | ||
to understand its behavior. Some common arguments are: | ||
* To run __on your webcam__, replace `--input files` with `--webcam`. | ||
* To run __on a video__, replace `--input files` with `--video-input video.mp4`. | ||
* To run __on cpu__, add `train.device=cpu` at end. | ||
* To save outputs to a directory (for images) or a file (for webcam or video), use `--output`. | ||
|
||
|
||
### Training & Evaluation in Command Line | ||
|
||
We provide a script `tools/train_net.py`, that is made to train all the configs provided in ODISE. | ||
|
||
To train a model with "train_net.py", first | ||
setup the COCO datasets following | ||
[datasets/README.md](./datasets/README.md#expected-dataset-structure-for-coco), | ||
then run for single node AMP training: | ||
```bash | ||
(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 8 --amp | ||
``` | ||
For multi-node AMP training: | ||
```bash | ||
(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 0 --num-machines 2 --dist-url tcp://node_addr:29500 --num-gpus 8 --amp | ||
(node1)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 1 --num-machines 2 --dist-url tcp://node_addr:29500 --num-gpus 8 --amp | ||
``` | ||
|
||
The configs are made for 16-GPU training. | ||
Since we use ADAMW optimizer, it is not clear how to scale learning rate with batch size. | ||
But we provide automatic scaling of learning rate and batch size by passing `--ref $REFERENCE_WORLD_SIZE`. | ||
For example, if you set `$REFERENCE_WORLD_SIZE=16` and running on 8 GPUs, the batch size and learning rate will be halved (8/16 = 0.5). | ||
|
||
```bash | ||
(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 8 --amp --ref 16 | ||
``` | ||
|
||
To evaluate a model's performance, run on single node | ||
``` | ||
(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 8 --eval-only --init-from /path/to/checkpoint | ||
``` | ||
or for multi-node inference: | ||
```bash | ||
(node0)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 0 --num-machines 2 --dist-url tcp://node0_addr:29500 --num-gpus 8 --eval-only --init-from /path/to/checkpoint | ||
(node1)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 1 --num-machines 2 --dist-url tcp://node0_addr:29500 --num-gpus 8 --eval-only --init-from /path/to/checkpoint | ||
``` | ||
|
||
To use the our `odise://` model zoo, you may pass in `--config-file configs/Panoptic/odise_label_coco_50e.py --init-from odise://Panoptic/odise_label_coco_50e` or `--config-file configs/Panoptic/odise_label_coco_50e.py --init-from odise://Panoptic/odise_caption_coco_50e` to `./tools/train_net.py` respectively. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
Copyright (c) 2022-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
|
||
NVIDIA Source Code License for ODISE: Open-Vocabulary Panoptic | ||
Segmentation with Text-to-Image Diffusion Models | ||
|
||
======================================================================= | ||
|
||
1. Definitions | ||
|
||
“Licensor” means any person or entity that distributes its Work. | ||
|
||
“Work” means (a) the original work of authorship made available under | ||
this license, which may include software, documentation, or other files, | ||
and (b) any additions to or derivative works thereof that are made | ||
available under this license. | ||
|
||
The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” | ||
have the meaning as provided under U.S. copyright law; provided, however, | ||
that for the purposes of this license, derivative works shall not include works | ||
that remain separable from, or merely link (or bind by name) to the | ||
interfaces of, the Work. | ||
|
||
Works are “made available” under this license by including in or with the Work | ||
either (a) a copyright notice referencing the applicability of | ||
this license to the Work, or (b) a copy of this license. | ||
|
||
2. License Grant | ||
|
||
2.1 Copyright Grant. Subject to the terms and conditions of this license, each | ||
Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, | ||
copyright license to use, reproduce, prepare derivative works of, publicly display, | ||
publicly perform, sublicense and distribute its Work and any resulting derivative | ||
works in any form. | ||
|
||
3. Limitations | ||
|
||
3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under | ||
this license, (b) you include a complete copy of this license with your distribution, | ||
and (c) you retain without modification any copyright, patent, trademark, or | ||
attribution notices that are present in the Work. | ||
|
||
3.2 Derivative Works. You may specify that additional or different terms apply to the use, | ||
reproduction, and distribution of your derivative works of the Work (“Your Terms”) only | ||
if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative | ||
works, and (b) you identify the specific derivative works that are subject to Your Terms. | ||
Notwithstanding Your Terms, this license (including the redistribution requirements in | ||
Section 3.1) will continue to apply to the Work itself. | ||
|
||
3.3 Use Limitation. The Work and any derivative works thereof only may be used or | ||
intended for use non-commercially. Notwithstanding the foregoing, NVIDIA Corporation | ||
and its affiliates may use the Work and any derivative works commercially. | ||
As used herein, “non-commercially” means for research or evaluation purposes only. | ||
|
||
3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor | ||
(including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that | ||
you allege are infringed by any Work, then your rights under this license from | ||
such Licensor (including the grant in Section 2.1) will terminate immediately. | ||
|
||
3.5 Trademarks. This license does not grant any rights to use any Licensor’s or its | ||
affiliates’ names, logos, or trademarks, except as necessary to reproduce | ||
the notices described in this license. | ||
|
||
3.6 Termination. If you violate any term of this license, then your rights under | ||
this license (including the grant in Section 2.1) will terminate immediately. | ||
|
||
4. Disclaimer of Warranty. | ||
|
||
THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, | ||
EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF | ||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. | ||
YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE. | ||
|
||
5. Limitation of Liability. | ||
|
||
EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, | ||
WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR | ||
BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, | ||
OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR | ||
INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS | ||
INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY | ||
OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE | ||
POSSIBILITY OF SUCH DAMAGES. | ||
|
||
======================================================================= |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
include odise/data/datasets/openseg_labels/*.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,180 @@ | ||
# ODISE | ||
# ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models | ||
|
||
**ODISE**: **O**pen-vocabulary **DI**ffusion-based panoptic **SE**gmentation exploits pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. | ||
It leverages the frozen representation of both these models to perform panoptic segmentation of any category in the wild. | ||
|
||
This repository is the official implementation of ODISE introduced in the paper: | ||
|
||
[**Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models**](https://arxiv.org/abs/2303.04803) | ||
[*Jiarui Xu*](https://jerryxu.net), | ||
[*Sifei Liu**](https://research.nvidia.com/person/sifei-liu), | ||
[*Arash Vahdat**](http://latentspace.cc/), | ||
[*Wonmin Byeon*](https://wonmin-byeon.github.io/), | ||
[*Xiaolong Wang*](https://xiaolonw.github.io/), | ||
[*Shalini De Mello*](https://research.nvidia.com/person/shalini-de-mello) | ||
CVPR 2023 Highlight. (*equal contribution) | ||
|
||
 | ||
|
||
## Visual Results | ||
|
||
<div align="center"> | ||
<img src="figs/github_vis_coco_0.gif" width="32%"> | ||
<img src="figs/github_vis_ade_0.gif" width="32%"> | ||
<img src="figs/github_vis_ego4d_0.gif" width="32%"> | ||
</div> | ||
<div align="center"> | ||
<img src="figs/github_vis_coco_1.gif" width="32%"> | ||
<img src="figs/github_vis_ade_1.gif" width="32%"> | ||
<img src="figs/github_vis_ego4d_1.gif" width="32%"> | ||
</div> | ||
|
||
|
||
## Links | ||
* [Jiarui Xu's Project Page](https://jerryxu.net/ODISE/) (with additional visual results) | ||
* [HuggingFace 🤗 Demo](https://huggingface.co/spaces/xvjiarui/ODISE) | ||
* [arXiv Page](https://arxiv.org/abs/2303.04803) | ||
|
||
## Citation | ||
|
||
If you find our work useful in your research, please cite: | ||
|
||
```BiBTeX | ||
@article{xu2022odise, | ||
author = {Xu, Jiarui and Liu, Sifei and Vahdat, Arash and Byeon, Wonmin and Wang, Xiaolong and De Mello, Shalini}, | ||
title = {{ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models}}, | ||
journal = {arXiv preprint arXiv: 2303.04803}, | ||
year = {2023}, | ||
} | ||
``` | ||
|
||
## Environment Setup | ||
|
||
Install dependencies by running: | ||
|
||
```bash | ||
conda create -n odise python=3.9 | ||
conda activate odise | ||
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia | ||
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev | ||
git clone [email protected]:NVlabs/ODISE.git | ||
cd ODISE | ||
pip install -e . | ||
``` | ||
|
||
(Optional) install [xformers](https://github.com/facebookresearch/xformers) for efficient transformer implementation: | ||
One could either install the pre-built version | ||
|
||
``` | ||
pip install xformers==0.0.16 | ||
``` | ||
|
||
or build from latest source | ||
|
||
```bash | ||
# (Optional) Makes the build much faster | ||
pip install ninja | ||
# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types | ||
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers | ||
# (this can take dozens of minutes) | ||
``` | ||
|
||
## Model Zoo | ||
|
||
We provide two pre-trained models for ODISE trained with label or caption | ||
supervision on [COCO's](https://cocodataset.org/#home) entire training set. | ||
ODISE's pre-trained models are subject to the [Creative Commons — Attribution-NonCommercial-ShareAlike 4.0 International — CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode) terms. | ||
Each model contains 28.1M trainable parameters. | ||
The download links for these models are provided in the table below. | ||
When you run the `demo/demo.py` script for the very first time, it will also automatically download ODISE's pre-trained model to your local folder `$HOME/.torch/iopath_cache/NVlabs/ODISE/releases/download/v1.0.0/`. | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<th align="center"></th> | ||
<th align="center" style="text-align:center" colspan="3">ADE20K(A-150)</th> | ||
<th align="center" style="text-align:center" colspan="3">COCO</th> | ||
<th align="center" style="text-align:center">ADE20K-Full <br> (A-847)</th> | ||
<th align="center" style="text-align:center">Pascal Context 59 <br> (PC-59)</th> | ||
<th align="center" style="text-align:center">Pascal Context 459 <br> (PC-459)</th> | ||
<th align="center" style="text-align:center">Pascal VOC 21 <br> (PAS-21) </th> | ||
<th align="center" style="text-align:center">download </th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td align="center"></td> | ||
<td align="center">PQ</td> | ||
<td align="center">mAP</td> | ||
<td align="center">mIoU</td> | ||
<td align="center">PQ</td> | ||
<td align="center">mAP</td> | ||
<td align="center">mIoU</td> | ||
<td align="center">mIoU</td> | ||
<td align="center">mIoU</td> | ||
<td align="center">mIoU</td> | ||
<td align="center">mIoU</td> | ||
</tr> | ||
<tr> | ||
<td align="center"><a href="configs/Panoptic/odise_label_coco_50e.py"> ODISE (label) </a></td> | ||
<td align="center">22.6</td> | ||
<td align="center">14.4</td> | ||
<td align="center">29.9</td> | ||
<td align="center">55.4</td> | ||
<td align="center">46.0</td> | ||
<td align="center">65.2</td> | ||
<td align="center">11.1</td> | ||
<td align="center">57.3</td> | ||
<td align="center">14.5</td> | ||
<td align="center">84.6</td> | ||
<td align="center"><a href="https://github.com/NVlabs/ODISE/releases/download/v1.0.0/odise_label_coco_50e-b67d2efc.pth"> checkpoint </a></td> | ||
</tr> | ||
<tr> | ||
<td align="center"><a href="configs/Panoptic/odise_caption_coco_50e.py"> ODISE (caption) </a></td> | ||
<td align="center">23.4</td> | ||
<td align="center">13.9</td> | ||
<td align="center">28.7</td> | ||
<td align="center">45.6</td> | ||
<td align="center">38.4</td> | ||
<td align="center">52.4</td> | ||
<td align="center">11.0</td> | ||
<td align="center">55.3</td> | ||
<td align="center">13.8</td> | ||
<td align="center">82.7</td> | ||
<td align="center"><a href="https://github.com/NVlabs/ODISE/releases/download/v1.0.0/odise_caption_coco_50e-853cc971.pth"> checkpoint </a></td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
|
||
## Get Started | ||
See [Preparing Datasets for ODISE](datasets/README.md). | ||
|
||
See [Getting Started with ODISE](GETTING_STARTED.md) for detailed instuctions on training and inference with ODISE. | ||
## Demo | ||
|
||
* Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the web demo: [](https://huggingface.co/spaces/xvjiarui/ODISE) | ||
|
||
* Run the demo on Google Colab: [](https://colab.research.google.com/github/NVlabs/ODISE/blob/master/demo/demo.ipynb) | ||
|
||
|
||
**Important Note**: ODISE links to the original pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt). When you run the `demo/demo.py` script for the very first time, besides ODISE's pre-trained models, it will also automaticlaly download the pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt), from their original sources, to your local directories `$HOME/.torch/` and `$HOME/.cache/clip`, respectively. | ||
These pre-trained models are subject to their authors' original licencse terms at [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [CLIP](https://github.com/openai/CLIP), respectively. | ||
|
||
* To run ODISE's demo from the command line: | ||
|
||
```shell | ||
python demo/demo.py --input demo/examples/coco.jpg --output demo/coco_pred.jpg --vocab "black pickup truck, pickup truck; blue sky, sky" | ||
``` | ||
The output is saved in `demo/coco_pred.jpg`. For more detailed options for `demo/demo.py` see [Getting Started with ODISE](GETTING_STARTED.md). | ||
|
||
|
||
* To run the Gradio demo locally: | ||
```shell | ||
python demo/app.py | ||
``` | ||
|
||
## Acknowledgement | ||
|
||
Code is largely based on [Detectron2](https://github.com/facebookresearch/detectron2), [Stable Diffusion](https://github.com/CompVis/stable-diffusion), [Mask2Former](https://github.com/facebookresearch/Mask2Former), [OpenCLIP](https://github.com/mlfoundations/open_clip) and [GLIDE](https://github.com/openai/glide-text2im). | ||
|
||
Thank you, all, for the great open-source projects! |
Oops, something went wrong.