[paper] [project page]
This repository contains download scripts and tooling for working with the UnCommon Objects in 3D (uCO3D) dataset.
uCO3D contains ~170,000 turn-table videos capturing objects from the LVIS taxonomy of object categories.
The dataset is described in our paper "UnCommon Objects in 3D".
- 170,000 videos scanning diverse objects from all directions.
- Objects come from the LVIS taxonomy of ~1000 categories, grouped into 50 super-categories.
- Unlike CO3Dv2, uCO3D releases full original videos instead of frames.
- Each video is annotated with object segmentation, camera poses, and 3 types of point clouds.
- The dataset newly contains a 3D Gaussian Splat reconstruction for each video.
- Each scene contains a long and short caption obtained with a large video-language model.
- Significantly improved annotation quality and size w.r.t. CO3Dv2.
The full dataset (processed version) takes ~19.3 TB of space. We distribute it in chunks up to 20 GB. We provide an automated way of downloading and decompressing the data.
First, run the install script that will also take care of dependencies:
git clone [email protected]:facebookresearch/uco3d.git
cd uco3d
pip install -e .
Then run the download script (make sure to change <DESTINATION_FOLDER>
):
python dataset_download/download_dataset.py --download_folder <DESTINATION_FOLDER> --checksum_check
As detailed here, we allow users to download only specific subsets of the dataset (e.g. only Gaussian Splats and RGB videos of specific object categories). This allows to greatly decrease the amount of required space.
Setting --download_modalities
to a comma-separated list of specific modality names will download only a subset of available modalities.
For instance
python dataset_download/download_dataset.py --download_folder <DESTINATION_FOLDER> --download_modalities "rgb_videos,point_clouds"
will only download rgb videos and point clouds.
Execute python dataset_download/download_dataset.py -h
for the list of all downloadable modalities.
The following table contains the size of all videos for a given modality:
----------------------------------
Modality Size (TB)
----------------------------------
rgb_videos 7.59
mask_videos 0.16
depth_maps 9.69
gaussian_splats 1.18
point_clouds 0.57
segmented_point_clouds 0.04
sparse_point_clouds 0.04
----------------------------------
Total 19.27
Setting --download_super_categories
will instruct the script to download only a subset of the available categories.
For instance
python dataset_download/download_dataset.py --download_folder <DESTINATION_FOLDER> --download_super_categories "vegetables_and_legumes,stationery"
will download only the vegetables&legumes and stationery super-categories.
Note that --download_modalities
can be mixed with --download_super_categories
to enable choosing any possible subset of the dataset.
Run python dataset_download/download_dataset.py -h
for the full list of options.
-
Setup the dataset root environment var
export UCO3D_DATASET_ROOT=<DESTINATION_FOLDER>
pointing to the root folder with the uCO3D dataset.
- Create the dataset object and fetch its data:
from uco3d import UCO3DDataset, UCO3DFrameDataBuilder # Get the dataset root folder and check that # all required metadata files exist. dataset_root = get_dataset_root(assert_exists=True) # Get the "small" subset list containing a small subset # of the uCO3D categories. For loading the whole dataset # use "set_lists_all-categories.sqlite". subset_lists_file = os.path.join( dataset_root, "set_lists", "set_lists_3categories-debug.sqlite", ) dataset = UCO3DDataset( subset_lists_file=subset_lists_file, subsets=["train"], frame_data_builder=UCO3DFrameDataBuilder( apply_alignment=True, load_images=True, load_depths=False, load_masks=True, load_depth_masks=True, load_gaussian_splats=True, gaussian_splats_truncate_background=True, load_point_clouds=True, load_segmented_point_clouds=True, load_sparse_point_clouds=True, box_crop=True, box_crop_context=0.4, load_frames_from_videos=True, image_height=800, image_width=800, undistort_loaded_blobs=True, ) ) # query the dataset object to obtain a single video frame of a sequence frame_data = dataset[100] # obtain the RGB image of the frame image_rgb = frame_data.image_rgb # obtain the 3D gaussian splats reconstructing the whole scene gaussian_splats = frame_data.sequence_gaussian_splats # render the scene gaussian splats into the camera of the loaded frame # NOTE: This requires the 'gsplat' library. You can install it with: # > pip install git+https://github.com/nerfstudio-project/[email protected] from uco3d import render_splats render_colors, render_alphas, render_info = render_splats( cameras=frame_data.camera, splats=gaussian_splats render_size=[512, 512] )
The examples folder contains python scripts with examples using the dataset.
The tests folder runs various tests checking the correctness of the implementation, and also visualizing various loadable modalities, such as point clouds or 3D Gaussian Splats.
To run tests execute the following:
cd tests
python run.py
The 3D Gaussian Splat and Pointcloud tests contain many scripts for loading and visualizing all point-cloud and 3D Gaussian Splat data which the dataset contains. Make sure to explore these to familiarize yourself with the dataset interface.
The dataset is organized in the filesystem as follows:
├── metadata.sqlite
├── set_lists
│ ├── set_lists_3categories-debug.sqlite
│ ├── set_lists_all-categories.sqlite
│ ├── set_lists_<subset_lists_name_2>.sqlite
│ ├── ...
├── <super_category_1>
│ ├── <category_1>
│ │ ├── <sequence_name_1>
│ │ │ ├── depth_maps.h5
│ │ │ ├── gaussian_splats
│ │ │ ├── mask_video.mkv
│ │ │ ├── rgb_video.mp4
│ │ │ ├── point_cloud.ply
│ │ │ ├── segmented_point_cloud.ply
│ │ │ └── sparse_point_cloud.ply
│ │ ├── <sequence_name_2>
│ │ │ ├── depth_maps.h5
│ │ │ ├── gaussian_splats
│ │ │ ├── mask_video.mkv
│ │ │ ├── rgb_video.mp4
│ │ │ ├── point_cloud.ply
│ │ │ ├── segmented_point_cloud.ply
│ │ │ └── sparse_point_cloud.ply
│ │ ├── ...
│ │ ├── <sequence_name_S>
│ ├── ...
│ ├── <category_C>
├── ...
├── <super_category_S>
Note that, differently from CO3Dv2, the frame-level data such as images or depth maps is solely released in form of videos or h5 files to save space. The provided UCO3DFrameDataBuilder
dataset object then seeks rgb/depth/mask frames from the loaded videos on-the-fly.
Each sequence-specific folder <super_category>/<category>/<sequence_name>
contains the following files:
rgb_video.mp4
: The original crowd-sourced video capturing the object from the visual category<category>
and super-category<super_category>
.mask_video.mkv
: Segmentation video of the same length asrgb_video.mp4
containing the video-segmentation of the foreground object. The latter was obtained usingLangSAM
in combination with a video segmentation refiner based onXMem
.depth_maps.h5
:hdf5
file containing a depth map for each of the 200 frames sampled equidistantly from the input video. We first runDepthAnythingV2
and align the result depth map's scale with the scene sparse point cloud fromsparse_point_cloud.ply
. Hence, the depth maps have a consistent scale within each scene, although they do not achieve strict pixel-wise consistency across multiple views. We are working on improving this and should provide more consistent depth maps in the future.gaussian_splats
: 3D Gaussian Splat reconstruction of the scene obtained with thegsplat
library (v1.3.0). The splats are compressed using the standardgsplat
compression method which sorts the gaussians using Self-Organizing Gaussian Grids followed bypng
compression.point_cloud.ply
: A dense colored 3D pointcloud reconstructing the scene. Obtained using VGGSfM.segmented_point_cloud.ply
: Same aspoint_cloud.ply
but restricted only to points covering the foreground object.sparse_point_cloud.ply
: Sparse geometrically-accurate scene pointcloud used to reconstruct the scene cameras. Obtained using VGGSfM.
The $UCO3D_DATASET_ROOT/metadata.sqlite
file contains a database of all frame-level and video-level metadata such as paths to individual RGB/mask videos, or camera poses for each frame. We opted for an SQL database since it provides fast access times without the need to store all metadata in memory (loading all metadata to memory usually takes minutes to hours for the whole dataset), and is widely supported.
The provided camera annotations follow the PyTorch3D
convention and are represented in the PyTorch3D NDC space. Note that PyTorch3D
is only an optional dependency which enables extra functionalities and tests within the codebase.
Note that, if PyTorch3D
is installed, the Cameras
objects loaded using the UCO3DDataset
object can be converted to the corresponding PyTorch3D
PerspectiveCameras
object using the Cameras.to_pytorch3d_cameras
function.
We also provide a conversion to the OpenCV (cv2
) camera format:
from uco3d import UCO3DDataset, UCO3DFrameDataBuilder
# import the camera conversion function:
from uco3d import opencv_cameras_projection_from_uco3d
# instantiate the dataset
dataset = UCO3DDataset(
...
)
# query the dataset object to obtain a single video frame of a sequence
frame_data = dataset[100]
R, tvec, camera_matrix = opencv_cameras_projection_from_uco3d(
frame_data.camera,
image_size=frame_data.image_size_hw[None],
) # R, tvec, camera_matrix follow OpenCV's camera definition
uCO3D also contains 3D Gaussian Splat (3DGS) reconstructions in each folder. Here, our Gaussian Splat reconstructions were obtained using gsplat
(v1.3.0). gsplat
is an optional dependency that allows fast rendering of the provided 3DGS reconstructions.
The easiest way to install the supported version of gsplat
is to use pip+git
:
pip install git+https://github.com/nerfstudio-project/[email protected]
Note that we also provide functions for rendering the loaded splats:
from uco3d import UCO3DDataset, UCO3DFrameDataBuilder
from uco3d import render_splats
# instantiate the dataset
dataset = UCO3DDataset(
...
)
# query the dataset object to obtain a single video frame of a sequence
frame_data = dataset[100]
# render the scene gaussian splats into the camera of the loaded frame
render_colors, render_alphas, render_info = render_splats(
cameras=frame_data.camera,
splats=frame_data.sequence_gaussian_splats,
render_size=[512, 512]
)
The subset lists files:
$UCO3D_DATASET_ROOT/set_lists/set_lists_<SETLIST_NAME>.sqlite
definine dataset splits. Specifically, each file contains a list of frames (identified with their sequence_name
and frame_number
) in the "train" and "val" subsets of the dataset.
In order to select a specific subset of the dataset, one passes the correct subset list path, and the subset name to the constructore of UCO3DDataset
.
For instance
dataset = UCO3DDataset(
subset_lists_file="<UCO3D_DATASET_ROOT>/set_lists/set_lists_all-categories.sqlite",
subsets=["train"],
frame_data_builder=...,
)
will load the "train" subset of the set_lists_all-categories.sqlite
subset list which contains the whole uCO3D dataset.
The folder <UCO3D_DATASET_ROOT>/set_lists/
provides the following subset lists:
- set_lists_3categories-debug.sqlite - A small split for debugging purposes. Very fast to load allowing fast iteration on the code using the dataset.
- set_lists_all-categories.sqlite - Contains the whole dataset.
- set_lists_static-categories.sqlite - Contains the videos of all rigid categories of uCO3D.
- set_lists_static-categories-accurate-reconstruction.sqlite - Contains the videos of all rigid categories of uCO3D with high-quality reconstructions.
- set_lists_dynamic-categories.sqlite - Contains the videos of all flexible categories (e.g. animals) of uCO3D.
Subset lists are stored as sqlite
tables. The easiest way is to use pandas
to select a subset of the main <UCO3D_DATASET_ROOT>/metadata.sqlite
table. The following example makes a subset list from or 100 and 30 training and validation samples respectively, by taking the first 130 sequences from the metadata:
import pandas as pd, sqlite3, os
# read the main metadata table (takes long time)
metadata_file = os.path.join(UCO3D_DATASET_ROOT, "metadata.sqlite")
frame_annots = pd.read_sql_table("frame_annots", f"sqlite:///{metadata_file}")
# pick dataset sequence names by taking unique sequence names from frame annotations
seqs = frame_annots["sequence_name"].unique()
# choose the list of train/val sequences
train_seqs, val_seqs = downloaded_seqs[:100], downloaded_seqs[100:130]
# training setlist
setlists_train = frame_annots[frame_annots.isin(train_seqs)][["frame_number","sequence_name"]]
setlists_train["subset"] = "train"
# validation setlist
setlists_val = frame_annots[frame_annots.isin(val_seqs)][["frame_number","sequence_name"]]
setlists_val["subset"] = "val"
# concatenate train and val
setlists = pd.concat([setlists_train, setlists_val])
# the new setlists will be stored as the set_lists_130.sqlite file in the original root
setlist_file = os.path.join(UCO3D_DATASET_ROOT, "set_lists", "set_lists_130.sqlite")
# store the new table
with sqlite3.connect(setlist_file) as con:
setlists.to_sql("set_lists", con, if_exists='replace', index=False)
The data are released under the CC BY 4.0 license.
This project uses code from other sources, which are licensed under their respective licenses:
If you use our dataset, please use the following citation:
@inproceedings{liu24uco3d,
Author = {Liu, Xingchen and Tayal, Piyush and Wang, Jianyuan and Zarzar, Jesus and Monnier, Tom and Tertikas, Konstantinos and Duan, Jiali and Toisoul, Antoine and Zhang, Jason Y. and Neverova, Natalia and Vedaldi, Andrea and Shapovalov, Roman and Novotny, David},
Booktitle = {arXiv},
Title = {UnCommon Objects in 3D},
Year = {2024},
}