Empowering 3D Visual Grounding with Reasoning Capabilities

ECCV 2024
Chenming Zhu Tai Wang Wenwei Zhang Kai Chen Xihui Liu*
The University of Hong Kong Shanghai AI Laboratory

📦 Benchmark and Model

Benchmark Overview

ScanReason is the first comprehensive and hierarchical 3D reasoning grounding benchmark. We define 5 types of questions depending on which type of reasoning is required: Spatial reasoning and function reasoning require fundamental understanding of the 3D physical world, focusing on objects themselves and inter-object spatial relationships in a 3D scene respectively, and logistic reasoning, emotional reasoning, and safety reasoning are high-level reasoning skills built upon the two fundamental reasoning abilities to address user-centric real-world applications.

Model Overview

🔥 News

[2023-10-10] We release our pre-version of ScanReason validation benchmark. Download here. The corresponding 3D bounding boxes annotations could be obtained through the object ids from EmbodiedScan.
[2023-10-01] We release the training and inference codes of ReGround3D.
[2023-07-02] We release the paper of ScanReason.

Getting Started

1. Installation

We utilize at least 4 A100 GPU for training and inference.
We test the code under the following environment:
- CUDA 11.8
- Python 3.9
- PyTorch 2.1.0

Git clone our repository and creating conda environment:

git clone https://github.com/ZCMax/ScanReason.git
conda create -n scanreason python=3.9
conda activate scanreason
pip install -r requirements.txt

Follow EmbodiedScan Installation Doc to install embodiedscan series.

Compile Pointnet2

cd pointnet2
python setup.py install --user

2. Data Preparation

Follow EmbodiedScan Data Preparation Doc to download the raw scan (RGB-D) datasets and modify the VIDEO_FOLDER in train_ds.sh to the raw data path.
Download the text annotations from Google Drive and modify the JSON_FOLDER in train_ds.sh to the annotations path, and modify the INFO_FILE data path which is included in the annotations.

3. Training ReGround3D

We provide the slurm training script with 4 A100 GPUs:

./scripts/train_ds.sh

4. Evaluation ReGround3D

After training, you can run the

./scripts/convert_zero_to_fp32.sh

to convert the weights to pytorch_model.bin file, and then use

./scripts/merge_lora_weights.sh

to merge lora weight and obtain the final checkpoints under ReGround3D-7B.

Finally, run

./scripts/eval_ds.sh

to obtain the grounding results.

📝 TODO List

First Release.
Release ReGround3D code.
Release ScanReason datasets and benchmark.

📄 License

This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

👏 Acknowledgements

This repo benefits from LISA, EmbodiedScan, 3D-LLM, LLaVA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Empowering 3D Visual Grounding with Reasoning Capabilities

📦 Benchmark and Model

Benchmark Overview

Model Overview

🔥 News

Getting Started

📝 TODO List

📄 License

👏 Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Empowering 3D Visual Grounding with Reasoning Capabilities

📦 Benchmark and Model

Benchmark Overview

Model Overview

🔥 News

Getting Started

📝 TODO List

📄 License

👏 Acknowledgements