[3DIdentBox part 1] [3DIdentBox part 2] [Paper]
Official code base for the generation of the 3DIdentBox datasets presented in the paper Identifiability Results for Multimodal Contrastive Learning and submitted to the CLeaR Dataset Track 2023. This GitHub repository extends the 3DIdent dataset and builds on top of its generation code to provide a versatile toolbox for identifiability benchmarking. The 3DIdentBox datasets comprises image/image pairs generated from controlled ground-truth factors using the Blender software. Ground-truth factors are either independantly sampled (part 1) or non-trivial causal dependencies exist between factors (part 2).
The training/validation/test partitions for each dataset consist of 250,000/10,000/10,000 samples of image pairs respectively. Images depict a colored teapot in front of a colored background, illuminated by a colored spotlight. View 1 displays the teaport with a metallic texture; View 2 displays the teapot with a rubber texture.
Each image is controlled by 9 factors of variation partitioned into 3 information blocks (Rotations, Positions, Hues) detailed below:
Information Block | Description | Raw Support | Blender Support | Visual Range | Details |
---|---|---|---|---|---|
Rotation |
|
Object |
|||
Rotation |
|
Object |
|||
Rotation |
|
Spotlight rotation angle | |||
Position |
|
- | Object |
||
Position |
|
- | Object |
||
Position |
|
- | Object |
||
Hue | Object color | Object HSV color, H parameter | |||
Hue | Spotlight color | Spotlight HSV color, H parameter | |||
Hue | Background color | Background HSV color, H parameter |
The generation of ground truth factors pairs follows specific distributions according to three sets of rules (i.e., Block Type). Factors following the content block type are shared accross views. Factors following the style block type are stochastically shared between views. Factors following the view-specific block type are either kept constant or specific to one view. For each dataset, information blocks are associated with a unique block type. Distributions linked to each block type are detailed below.
All combinaisons of Information Blocks : Block Type are available for download.
[3DIdentBox part 1] features 6 datasets, one for each Information Blocks : Block Type combination. Factors are generated according to the following rules. As a consequence, there exist no causal dependencies between an image's generative factors in this configuration.
Block Type | Symbol | View 1 Distribution | View 2 Distribution | Description |
---|---|---|---|---|
Content | Shared between views | |||
Style | Stochastically shared between views | |||
View-Specific |
|
Block Type | Symbol | View 1 Distribution | View 2 Distribution | Description |
---|---|---|---|---|
Content | Shared between views, causal dependencies between |
|||
Style | Stochastic between views, causal dependencies between |
|||
View-Specific |
|
The sample pairs and their associated ground-truth factors can be downloaded here:
The folder structure follows the following logic:
hues_positions_rotations # example folder name
├── samples # sample pairs x = {x,\tilde{x}}
│ ├── m1 # x, first elements of each sample pair (e.g., "000000.png")
│ │ └── *.png
│ └── m2 # \tilde{x}, second elements of each sample pair (e.g., "000000.png")
│ └── *.png
└── factors # ground truth factors pairs z = {z,\tilde{z}} = {[c,s,m],[\tilde{c},\tilde{s},\tilde{m}]}
├── m1 # z, first elements of each gt factors pair
│ ├── latents.npy # z, distributed across blender support
│ └── raw_latents.npy # z, distributed across raw support
└── m2 # \tilde{z}, second elements of each gt factors pair
├── latents.npy
└── raw_latents.npy
Each folder contains a full dataset with specific block types associated with each information block. The example hues_positions_rotations
folder name follows the folder name template content_style_view\_specific
. In this example the Hues block is content information, Positions is style information and Rotations is view-specific information.
The following command generates latents for 10 pairs of images depicting a scene with two objects. In particular, image pairs display variable object types, object positions, object rotation, spotlight rotation and scene hues.
In this particular setting ---object-content
sets the object type as content, ---position-style
sets the object position as style, ---rotation-style
sets the object and spotlight rotation as style and ---hue-ms
sets the scene hues as modality-specific information. By default, view 1 and view 2 content and style information follow uniform and normal distributions respectively. However, the default settings can be adapted by setting the ---continuous-marginal
and ---continuous-conditional
parameters to a normal
or uniform
distributions. In addition, the standard deviation of normal marginal and conditional distribution can be adjusted to custom settings by adjusting the ---normal-marginal-std
,---normal-conditional-std
(for style variables),---normal-conditional-noise
(for content variables) respectively. In a similar fashion, the lower and upper bound of uniform distributions can be customized by djusting the --uniform-marginal-a
(lower bound), --uniform-marginal-b
(upper bound), --uniform-conditional-a
(lower bound - for content variables), --uniform-conditional-b
(upper bound - for content variables), --uniform-conditional-noise-a
(lower bound - for style variables), --uniform-conditional-noise-b
(upper bound - for style variables)
############# ADAPT THE FOLLOWING PARAMETERS ###############
OUTPUT_FOLDER="example" # output folder for latent and image storage
SAMPLE_PAIRS=10 # number of sample pairs
NB_OBJECTS=2 # number of objects per image
############################################################
python generate_clevr_dataset_latents.py --output-folder ${OUTPUT_FOLDER} --n-pairs ${SAMPLE_PAIRS} --object --position --rotation --hue --object-content --position-style --rotation-style --hue-ms --n-object ${NB_OBJECTS}
The following command renders images based on previously generated latents stored in ${OUTPUT_FOLDER}/m{1-2}/latents.npy
. Images are rendered and stored in ${OUTPUT_FOLDER}/images
.
############# ADAPT THE FOLLOWING PARAMETERS ###############
BLENDER_DIR=/example/blender # path to blender software
OUTPUT_FOLDER="example" # output folder for latent and image storage
LATENT_FOLDER="example_subfolder" # subfolder where latents are stored
MATERIAL="MyMetal" # adjust: MyMetal, MyCristal, MyRubber
############################################################
N_BATCHES=10
for (( i=0; i<=$N_BATCHES; i++ ))
do
${BLENDER_DIR} -noaudio --background --python generate_clevr_dataset_images.py --use-gpu --output-folder ${OUTPUT_FOLDER}/${LATENT_FOLDER} --n-batches 10 --batch-index ${i} --material-names ${MATERIAL} --no_range_change
done
If you find our datasets useful, please cite our paper:
@article{daunhawer2023multimodal,
author = {
Daunhawer, Imant and
Bizeul, Alice and
Palumbo, Emanuele and
Marx, Alexander and
Vogt, Julia E.
},
title = {
Identifiability Results for Multimodal Contrastive Learning
},
booktitle = {International Conference on Learning Representations},
year = {2023}
}
This project builds on the following resources. Please cite them appropriately.