diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md index 4538592..e9c94cb 100644 --- a/GETTING_STARTED.md +++ b/GETTING_STARTED.md @@ -4,7 +4,7 @@ This document provides a brief introduction on how to infer with and train ODISE For further reading, please refer to [Getting Started with Detectron2](https://github.com/facebookresearch/detectron2/blob/master/GETTING_STARTED.md). -**Important Note**: ODISE's `demo/demo.py` and `tools/train_net.py` scripts link to the original pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt). When you run them for the very first time, these scripts will automaticlaly download the pre-trained models for Stable Diffuson and CLIP, from their original sources, to your local directories `$HOME/.torch/` and `$HOME/.cache/clip`, respectively. Their use is subject to the original licencse terms defined at [https://github.com/CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion) and [https://github.com/openai/CLIP](https://github.com/openai/CLIP), respectively. +**Important Note**: ODISE's `demo/demo.py` and `tools/train_net.py` scripts link to the original pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt). When you run them for the very first time, these scripts will automatically download the pre-trained models for Stable Diffuson and CLIP, from their original sources, to your local directories `$HOME/.torch/` and `$HOME/.cache/clip`, respectively. Their use is subject to the original license terms defined at [https://github.com/CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion) and [https://github.com/openai/CLIP](https://github.com/openai/CLIP), respectively. ### Inference Demo with Pre-trained ODISE Models @@ -31,7 +31,7 @@ to understand its behavior. Some common arguments are: * To run __on the cpu__, add `train.device=cpu` at the end. * To save outputs to a directory (for images) or a file (for webcam or video), use the `--output` option. -The default bevahior is to append the user-provided extra vocabulary to the labels from COCO, ADE20K and LVIS. +The default behavior is to append the user-provided extra vocabulary to the labels from COCO, ADE20K and LVIS. To use **only** the user-provided vocabulary use `--label ""`. ``` @@ -61,8 +61,8 @@ For 4-node (32-GPUs) AMP-based training, run: (node3)$ ./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --machine-rank 3 --num-machines 4 --dist-url tcp://${MASTER_ADDR}:29500 --num-gpus 8 --amp ``` -Not that our default traning configurations are designed for 32 GPUs. -Since we use the ADAMW optimizer, it is not clear as to how to scale the learning rate with batch size. +Not that our default training configurations are designed for 32 GPUs. +Since we use the AdamW optimizer, it is not clear as to how to scale the learning rate with batch size. However, we provide the ability to automatically scale the learning rate and the batch size for any number of GPUs used for training by passing in the`--ref $REFERENCE_WORLD_SIZE` argument. For example, if you set `$REFERENCE_WORLD_SIZE=32` while training on 8 GPUs, the batch size and learning rate will be set to 8/32 = 0.25 of the original ones. diff --git a/README.md b/README.md index 9afc947..fea6d56 100644 --- a/README.md +++ b/README.md @@ -42,11 +42,11 @@ For business inquiries, please visit our website and submit the form: [NVIDIA Re If you find our work useful in your research, please cite: ```BiBTeX -@article{xu2022odise, - author = {Xu, Jiarui and Liu, Sifei and Vahdat, Arash and Byeon, Wonmin and Wang, Xiaolong and De Mello, Shalini}, - title = {{ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models}}, - journal = {arXiv preprint arXiv: 2303.04803}, - year = {2023}, +@article{xu2023odise, + title={{Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models}}, + author={Xu, Jiarui and Liu, Sifei and Vahdat, Arash and Byeon, Wonmin and Wang, Xiaolong and De Mello, Shalini}, + journal={arXiv preprint arXiv:2303.04803}, + year={2023} } ``` @@ -88,7 +88,7 @@ supervision on [COCO's](https://cocodataset.org/#home) entire training set. ODISE's pre-trained models are subject to the [Creative Commons — Attribution-NonCommercial-ShareAlike 4.0 International — CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode) terms. Each model contains 28.1M trainable parameters. The download links for these models are provided in the table below. -When you run the `demo/demo.py` script for the very first time, it will also automatically download ODISE's pre-trained model to your local folder `$HOME/.torch/iopath_cache/NVlabs/ODISE/releases/download/v1.0.0/`. +When you run the `demo/demo.py` or inference script for the very first time, it will also automatically download ODISE's pre-trained model to your local folder `$HOME/.torch/iopath_cache/NVlabs/ODISE/releases/download/v1.0.0/`. @@ -151,7 +151,7 @@ When you run the `demo/demo.py` script for the very first time, it will also aut ## Get Started See [Preparing Datasets for ODISE](datasets/README.md). -See [Getting Started with ODISE](GETTING_STARTED.md) for detailed instuctions on training and inference with ODISE. +See [Getting Started with ODISE](GETTING_STARTED.md) for detailed instructions on training and inference with ODISE. ## Demo * Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the web demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/xvjiarui/ODISE) @@ -160,7 +160,7 @@ See [Getting Started with ODISE](GETTING_STARTED.md) for detailed instuctions on **Important Note**: When you run the `demo/demo.py` script for the very first time, besides ODISE's pre-trained models, it will also automaticlaly download the pre-trained models for [Stable Diffusion v1.3](https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt) and [CLIP](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt), from their original sources, to your local directories `$HOME/.torch/` and `$HOME/.cache/clip`, respectively. -The pre-trained models for Stable Diffusion and CLIP are subject to their original licencse terms from [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [CLIP](https://github.com/openai/CLIP), respectively. +The pre-trained models for Stable Diffusion and CLIP are subject to their original license terms from [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [CLIP](https://github.com/openai/CLIP), respectively. * To run ODISE's demo from the command line: diff --git a/demo/demo.ipynb b/demo/demo.ipynb index 07f463b..91c2f45 100644 --- a/demo/demo.ipynb +++ b/demo/demo.ipynb @@ -20,8 +20,10 @@ "metadata": {}, "outputs": [], "source": [ - "# Install\n", - "!pip install git+https://@github.com/NVlabs/ODISE.git" + "# Uncomment following if you are running this notebook on Google Colab\n", + "# !pip uninstall torchtext -y\n", + "# !pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116\n", + "# !pip install git+https://@github.com/NVlabs/ODISE.git" ] }, {