Skip to content

Latest commit

 

History

History
57 lines (41 loc) · 2.15 KB

README.md

File metadata and controls

57 lines (41 loc) · 2.15 KB

[CVPR2024] Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

We release our code and trained models for our CVPR2024 paper Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

Getting started

Environment setup

First, clone this repo:

git clone https://github.com/slonetime/EBSeg.git

Then, create a new conda env and install required packeges:

cd EBSeg
conda create --name ebseg python=3.9
conda activate ebseg
pip install -r requirements.txt
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

At last, install the MultiScaleDeformableAttention in Mask2former:

cd ebseg/model/mask2former/modeling/pixel_decoder/ops/
sh make.sh 

Data preparation

We follow the dataset preparation process in SAN, so please follow the instructions in https://github.com/MendelXu/SAN?tab=readme-ov-file#data-preparation.

Training

First, change the config_file path, dataset_dir path and ourput_dir path in train.sh. Then, you can train an EBSeg model with the following command:

bash train.sh

Inference with our trained model

Download our trained models from the url links in the followding table(with mIoU metric):

Model A-847 PC-459 A-150 PC-59 VOC
EBSeg-B 11.1 17.3 30.0 56.7 94.6
EBSeg-L 13.7 21.0 32.8 60.2 96.4

Like training, you should change the config_file path, dataset_dir path, checkpoint path and ourput_dir path in test.sh. Then, test a EBSeg model by:

bash test.sh

Acknowledgments

Our code are based on SAN, CLIP, CLIP Surgery, Mask2former and ODISE.

We thanks them for their excellent works!