English | 简体中文
Shaoyuan Xie
Lingdong Kong
Wenwei Zhang
Jiawei Ren
Liang Pan
Kai Chen
Ziwei Liu
RoboBEV
is the first robustness evaluation benchmark tailored for camera-based bird's eye view (BEV) detection under natural corruptions. It includes eight corruption types that are likely to appear in driving scenarios, ranging from 1sensor failure, 2motion & data processing, 3lighting conditions, and 4weather conditions.
FRONT_LEFT | FRONT | FRONT_RIGHT | FRONT_LEFT | FRONT | FRONT_RIGHT |
BACK_LEFT | BACK | BACK_RIGHT | BACK_LEFT | BACK | BACK_RIGHT |
Visit our project page to explore more examples. 🚙
- [2023.02] - The
nuScenes-C
dataset is pending release for a careful check of potential IP issues. - [2023.01] - Launch of the
RoboBEV
benchmark! In this initial version, we include 10 camera-only BEV detection algorithms (23 variants), evaluated with 8 corruption types across 3 severity levels.
- Installation
- Data Preparation
- Getting Started
- Model Zoo
- Robustness Benchmark
- BEV Model Calibration
- Create Corruption Set
- TODO List
- Citation
- License
- Acknowledgements
Kindly refer to INSTALL.md for the installation details.
Kindly refer to DATA_PREPARE.md for the details to prepare the nuScenes
and nuScenes-C
datasets.
Kindly refer to GET_STARTED.md to learn more usage about this codebase.
Camera-Only BEV Detection
- Fast-BEV, arXiv 2023.
[Code]
- SOLOFusion, ICLR 2023.
[Code]
- PolarFormer, AAAI 2023.
[Code]
- BEVStereo, AAAI 2023.
[Code]
- BEVDepth, AAAI 2023.
[Code]
- MatrixVT, arXiv 2022.
[Code]
- Sparse4D, arXiv 2022.
[Code]
- CrossDTR, arXiv 2022.
[Code]
- SRCN3D, arXiv 2022.
[Code]
- PolarDETR, arXiv 2022.
[Code]
- BEVerse, arXiv 2022.
[Code]
- M^2BEV, arXiv 2022.
[Code]
- ORA3D, BMVC 2022.
[Code]
- Graph-DETR3D, ACM MM 2022.
[Code]
- SpatialDETR, ECCV 2022.
[Code]
- PETR, ECCV 2022.
[Code]
- BEVFormer, ECCV 2022.
[Code]
- BEVDet, arXiv 2021.
[Code]
- DETR3D, CoRL 2021.
[Code]
LiDAR-Camera Fusion BEV Detection
📊 Metrics: The nuScenes Detection Score (NDS) is consistently used as the main indicator for evaluating model performance in our benchmark. The following two metrics are adopted to compare between models' robustness:
- mCE (the lower the better): The average corruption error (in percentage) of a candidate model compared to the baseline model, which is calculated among all corruption types across three severity levels.
- mRR (the higher the better): The average resilience rate (in percentage) of a candidate model compared to its "clean" performance, which is calculated among all corruption types across three severity levels.
⚙️ Notation: Symbol ⭐ denotes the baseline model adopted in mCE calculation. For more detailed experimental results, please refer to docs/results.
Model | mCE (%) |
mRR (%) |
Clean | Cam Crash | Frame Lost | Color Quant | Motion Blur | Bright | Low Light | Fog | Snow |
---|---|---|---|---|---|---|---|---|---|---|---|
DETR3D⭐ | 100.00 | 70.77 | 0.4224 | 0.2859 | 0.2604 | 0.3177 | 0.2661 | 0.4002 | 0.2786 | 0.3912 | 0.1913 |
DETR3DCBGS | 99.21 | 70.02 | 0.4341 | 0.2991 | 0.2685 | 0.3235 | 0.2542 | 0.4154 | 0.2766 | 0.4020 | 0.1925 |
BEVFormerSmall | 101.23 | 59.07 | 0.4787 | 0.2771 | 0.2459 | 0.3275 | 0.2570 | 0.3741 | 0.2413 | 0.3583 | 0.1809 |
BEVFormerBase | 97.97 | 60.40 | 0.5174 | 0.3154 | 0.3017 | 0.3509 | 0.2695 | 0.4184 | 0.2515 | 0.4069 | 0.1857 |
PETRR50-p4 | 111.01 | 61.26 | 0.3665 | 0.2320 | 0.2166 | 0.2472 | 0.2299 | 0.2841 | 0.1571 | 0.2876 | 0.1417 |
PETRVoV-p4 | 100.69 | 65.03 | 0.4550 | 0.2924 | 0.2792 | 0.2968 | 0.2490 | 0.3858 | 0.2305 | 0.3703 | 0.2632 |
ORA3D | 99.17 | 68.63 | 0.4436 | 0.3055 | 0.2750 | 0.3360 | 0.2647 | 0.4075 | 0.2613 | 0.3959 | 0.1898 |
BEVDetR50 | 115.12 | 51.83 | 0.3770 | 0.2486 | 0.1924 | 0.2408 | 0.2061 | 0.2565 | 0.1102 | 0.2461 | 0.0625 |
BEVDetR101 | 113.68 | 53.12 | 0.3877 | 0.2622 | 0.2065 | 0.2546 | 0.2265 | 0.2554 | 0.1118 | 0.2495 | 0.0810 |
BEVDetR101-pt | 112.80 | 56.35 | 0.3780 | 0.2442 | 0.1962 | 0.3041 | 0.2590 | 0.2599 | 0.1398 | 0.2073 | 0.0939 |
BEVDetSwinT | 116.48 | 46.26 | 0.4037 | 0.2609 | 0.2115 | 0.2278 | 0.2128 | 0.2191 | 0.0490 | 0.2450 | 0.0680 |
BEVDepthR50 | 110.02 | 56.82 | 0.4058 | 0.2638 | 0.2141 | 0.2751 | 0.2513 | 0.2879 | 0.1757 | 0.2903 | 0.0863 |
BEVerseSwinT | 110.67 | 48.60 | 0.4665 | 0.3181 | 0.3037 | 0.2600 | 0.2647 | 0.2656 | 0.0593 | 0.2781 | 0.0644 |
BEVerseSwinS | 117.82 | 49.57 | 0.4951 | 0.3364 | 0.2485 | 0.2807 | 0.2632 | 0.3394 | 0.1118 | 0.2849 | 0.0985 |
PolarFormerR101 | 96.06 | 70.88 | 0.4602 | 0.3133 | 0.2808 | 0.3509 | 0.3221 | 0.4304 | 0.2554 | 0.4262 | 0.2304 |
PolarFormerVoV | 98.75 | 67.51 | 0.4558 | 0.3135 | 0.2811 | 0.3076 | 0.2344 | 0.4280 | 0.2441 | 0.4061 | 0.2468 |
SRCN3DR101 | 99.67 | 70.23 | 0.4286 | 0.2947 | 0.2681 | 0.3318 | 0.2609 | 0.4074 | 0.2590 | 0.3940 | 0.1920 |
SRCN3DVoV | 102.04 | 67.95 | 0.4205 | 0.2875 | 0.2579 | 0.2827 | 0.2143 | 0.3886 | 0.2274 | 0.3774 | 0.2499 |
Sparse4DR101 | 100.01 | 55.04 | 0.5438 | 0.2873 | 0.2611 | 0.3310 | 0.2514 | 0.3984 | 0.2510 | 0.3884 | 0.2259 |
FCOS3Dfinetune | 107.82 | 62.09 | 0.3949 | 0.2849 | 0.2479 | 0.2574 | 0.2570 | 0.3218 | 0.1468 | 0.3321 | 0.1136 |
BEVFusionCam | - | - | 0.4121 | - | - | - | - | - | - | - | - |
BEVFusionLiDAR | - | - | 0.6928 | - | - | - | - | - | - | - | - |
BEVFusionC+L | - | - | 0.7138 | - | - | - | - | - | - | - | - |
Model | Pretrain | Temporal | Depth | CBGS | Backbone | EncoderBEV | Input Size | mCE (%) | mRR (%) | NDS |
---|---|---|---|---|---|---|---|---|---|---|
DETR3D | ✓ | ✗ | ✗ | ✗ | ResNet | Attention | 1600×900 | 100.00 | 70.77 | 0.4224 |
DETR3DCBGS | ✓ | ✗ | ✗ | ✓ | ResNet | Attention | 1600×900 | 99.21 | 70.02 | 0.4341 |
BEVFormerSmall | ✓ | ✓ | ✗ | ✗ | ResNet | Attention | 1280×720 | 101.23 | 59.07 | 0.4787 |
BEVFormerBase | ✓ | ✓ | ✗ | ✗ | ResNet | Attention | 1600×900 | 97.97 | 60.40 | 0.5174 |
PETRR50-p4 | ✗ | ✗ | ✗ | ✗ | ResNet | Attention | 1408×512 | 111.01 | 61.26 | 0.3665 |
PETRVoV-p4 | ✓ | ✗ | ✗ | ✗ | VoVNetV2 | Attention | 1600×900 | 100.69 | 65.03 | 0.4550 |
ORA3D | ✓ | ✗ | ✗ | ✗ | ResNet | Attention | 1600×900 | 99.17 | 68.63 | 0.4436 |
PolarFormerR101 | ✓ | ✗ | ✗ | ✗ | ResNet | Attention | 1600×900 | 96.06 | 70.88 | 0.4602 |
PolarFormerVoV | ✓ | ✗ | ✗ | ✗ | VoVNetV2 | Attention | 1600×900 | 98.75 | 67.51 | 0.4558 |
SRCN3DR101 | ✓ | ✗ | ✗ | ✗ | ResNet | CNN+Attn. | 1600×900 | 99.67 | 70.23 | 0.4286 |
SRCN3DVoV | ✓ | ✗ | ✗ | ✗ | VoVNetV2 | CNN+Attn. | 1600×900 | 102.04 | 67.95 | 0.4205 |
Sparse4DR101 | ✓ | ✓ | ✗ | ✗ | ResNet | CNN+Attn. | 1600×900 | 100.01 | 55.04 | 0.5438 |
BEVDetR50 | ✗ | ✗ | ✓ | ✓ | ResNet | CNN | 704×256 | 115.12 | 51.83 | 0.3770 |
BEVDetR101 | ✗ | ✗ | ✓ | ✓ | ResNet | CNN | 704×256 | 113.68 | 53.12 | 0.3877 |
BEVDetR101-pt | ✓ | ✗ | ✓ | ✓ | ResNet | CNN | 704×256 | 112.80 | 56.35 | 0.3780 |
BEVDetSwinT | ✗ | ✗ | ✓ | ✓ | Swin | CNN | 704×256 | 116.48 | 46.26 | 0.4037 |
BEVDepthR50 | ✗ | ✗ | ✓ | ✓ | ResNet | CNN | 704×256 | 110.02 | 56.82 | 0.4058 |
BEVerseSwinT | ✗ | ✗ | ✓ | ✓ | Swin | CNN | 704×256 | 137.25 | 28.24 | 0.1603 |
BEVerseSwinT | ✗ | ✓ | ✓ | ✓ | Swin | CNN | 704×256 | 110.67 | 48.60 | 0.4665 |
BEVerseSwinS | ✗ | ✗ | ✓ | ✓ | Swin | CNN | 1408×512 | 132.13 | 29.54 | 0.2682 |
BEVerseSwinS | ✗ | ✓ | ✓ | ✓ | Swin | CNN | 1408×512 | 117.82 | 49.57 | 0.4951 |
Note: Pretrain denotes models initialized from the FCOS3D checkpoint. Temporal indicates whether temporal information is used. Depth denotes models with an explicit depth estimation branch. CBGS highlight models use the class-balanced group-sampling strategy.
You can manage to create your own "RoboBEV" corrpution sets! Follow the instructions listed in CREATE.md.
- Initial release. 🚀
- Add scripts for creating common corruptions.
- Add download link of nuScenes-C.
- Add evaluation scripts on corruption sets.
- ...
If you find this work helpful, please kindly consider citing the following:
@article{xie2023robobev,
title = {RoboBEV: Robust Bird's Eye View Detection under Corruptions},
author = {Xie, Shaoyuan and Kong, Lingdong and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
journal = {arXiv preprint arXiv:23xx.xxxxx},
year = {2023}
}
@misc{xie2023robobev_codebase,
title = {RoboBEV: Towards Robust Bird's Eye View Detection under Corruptions},
author = {Xie, Shaoyuan and Kong, Lingdong and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
howpublished = {\url{https://github.com/Daniel-xsy/RoboBEV}},
year = {2023}
}
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, while some specific operations in this codebase might be with other licenses. Please refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.
To be updated.