Skip to content

Latest commit

 

History

History
142 lines (113 loc) · 11.6 KB

README.md

File metadata and controls

142 lines (113 loc) · 11.6 KB

3DIdentBox: Identifiability Benchmark Datasets

[3DIdentBox part 1] [3DIdentBox part 2] [Paper]

Official code base for the generation of the 3DIdentBox datasets presented in the paper Identifiability Results for Multimodal Contrastive Learning and submitted to the CLeaR Dataset Track 2023. This GitHub repository extends the 3DIdent dataset and builds on top of its generation code to provide a versatile toolbox for identifiability benchmarking. The 3DIdentBox datasets comprises image/image pairs generated from controlled ground-truth factors using the Blender software. Ground-truth factors are either independantly sampled (part 1) or non-trivial causal dependencies exist between factors (part 2).

3DIdentBox dataset example images

Dataset Description


The training/validation/test partitions for each dataset consist of 250,000/10,000/10,000 samples of image pairs respectively. Images depict a colored teapot in front of a colored background, illuminated by a colored spotlight. View 1 displays the teaport with a metallic texture; View 2 displays the teapot with a rubber texture.

Each image is controlled by 9 factors of variation partitioned into 3 information blocks (Rotations, Positions, Hues) detailed below:

Information Block Description Raw Support Blender Support Visual Range Details
Rotation $\alpha$-angle $[-1;1]$ $[-\pi;\pi]$ $[0^{\circ},360^{\circ}]$ Object $\alpha$ rotation angle
Rotation $\beta$-angle $[-1;1]$ $[-\pi;\pi]$ $[0^{\circ},360^{\circ}]$ Object $\beta$ rotation angle
Rotation $\gamma$-angle $[-1;1]$ $[-\pi;\pi]$ $[0^{\circ},360^{\circ}]$ Spotlight rotation angle
Position $x$-coordinate $[-1;1]$ $[-2;2]$ - Object $x$-coordinate
Position $y$-coordinate $[-1;1]$ $[-2;2]$ - Object $y$-coordinate
Position $z$-coordinate $[-1;1]$ $[-2;2]$ - Object $z$-coordinate
Hue Object color $[-1;1]$ $[-\pi; \pi]$ $[0^{\circ},360^{\circ}]$ Object HSV color, H parameter
Hue Spotlight color $[-1;1]$ $[-\pi; \pi]$ $[0^{\circ},360^{\circ}]$ Spotlight HSV color, H parameter
Hue Background color $[-1;1]$ $[-\pi; \pi]$ $[0^{\circ},360^{\circ}]$ Background HSV color, H parameter

The generation of ground truth factors pairs follows specific distributions according to three sets of rules (i.e., Block Type). Factors following the content block type are shared accross views. Factors following the style block type are stochastically shared between views. Factors following the view-specific block type are either kept constant or specific to one view. For each dataset, information blocks are associated with a unique block type. Distributions linked to each block type are detailed below.

All combinaisons of Information Blocks : Block Type are available for download.

Part 1: Without inter- & intra- block causal dependencies

[3DIdentBox part 1] features 6 datasets, one for each Information Blocks : Block Type combination. Factors are generated according to the following rules. As a consequence, there exist no causal dependencies between an image's generative factors in this configuration.

Block Type Symbol View 1 Distribution View 2 Distribution Description
Content $c=[c_1,c_2,c_3]$ $c \sim [\mathcal{U}([-1,1]),\mathcal{U}([-1,1]),\mathcal{U}([-1,1])]$ $\tilde{c} \sim [\mathcal{\delta}(c_1),\mathcal{\delta}(c_2),\mathcal{\delta}(c_3)]$ Shared between views
Style $s=[s_1,s_2,s_3]$ $s \sim [\mathcal{U}([-1,1]),\mathcal{U}([-1,1]),\mathcal{U}([-1,1])]$ $\tilde{s} \sim {\mathcal{N}_t (s_1,1),\mathcal{N}_t (s_2,1),\mathcal{N}_t (s_3,1)}$ Stochastically shared between views
View-Specific $m=[m_1,m_2,m_3]$ $m \sim [\mathcal{U}([-1,1]),\mathcal{\delta}(0),\mathcal{\delta}(0)]$ $\tilde{m} \sim [\mathcal{\delta}(0),\mathcal{\delta}(0),\mathcal{U}([-1,1])]$ $m_1$ is specific to View 1, $m_2$ is constant, $m_3$ is specific to View 2

Part 2: With inter- & intra- block causal dependencies

Block Type Symbol View 1 Distribution View 2 Distribution Description
Content $c=[c_1,c_2,c_3]$ $c \sim [\mathcal{N}_{t}(c_2,1),\mathcal{U}([-1,1]),\mathcal{U}([-1,1])]$ $\tilde{c} \sim [\mathcal{\delta}(c_1),\mathcal{\delta}(c_2),\mathcal{\delta}(c_3)]$ Shared between views, causal dependencies between $c_2 \rightarrow c_1$
Style $s=[s_1,s_2,s_3]$ $s \sim [\mathcal{N}_t (s_2,1),\mathcal{U}([-1,1]),\mathcal{N}_t (c_3,1)]$ $\tilde{s} \sim [\mathcal{N}_t (s_1,1),\mathcal{N}_t (s_2,1),\mathcal{N}_t (s_3,1)]$ Stochastic between views, causal dependencies between $c_3 \rightarrow s_3$, $s_2 \rightarrow s_1$
View-Specific $m=[m_1,m_2,m_3]$ $m \sim [\mathcal{U}([-1,1]),\mathcal{\delta}(0),\mathcal{\delta}(0)]$ $\tilde{m} \sim [\mathcal{\delta}(0),\mathcal{\delta}(0),\mathcal{U}([-1,1])]$ $m_1$ is specific to View 1, $m_2$ is constant, $m_3$ is specific to View 2

$\mathcal{N}_t$ refers to a normal distribution truncated to the $[-1,1]$ interval.

Download


The sample pairs and their associated ground-truth factors can be downloaded here:

The folder structure follows the following logic:

hues_positions_rotations               # example folder name
├── samples                            # sample pairs x = {x,\tilde{x}}
│   ├── m1                             # x, first elements of each sample pair (e.g., "000000.png")
│   │   └── *.png
│   └── m2                             # \tilde{x}, second elements of each sample pair (e.g., "000000.png")
│       └── *.png
└── factors                            # ground truth factors pairs z = {z,\tilde{z}} = {[c,s,m],[\tilde{c},\tilde{s},\tilde{m}]}
    ├── m1                             # z, first elements of each gt factors pair 
    │   ├── latents.npy                # z, distributed across blender support
    │   └── raw_latents.npy            # z, distributed across raw support
    └── m2                             # \tilde{z}, second elements of each gt factors pair 
        ├── latents.npy  
        └── raw_latents.npy   

Each folder contains a full dataset with specific block types associated with each information block. The example hues_positions_rotations folder name follows the folder name template content_style_view\_specific. In this example the Hues block is content information, Positions is style information and Rotations is view-specific information.

Custom Generation


The following command generates latents for 10 pairs of images depicting a scene with two objects. In particular, image pairs display variable object types, object positions, object rotation, spotlight rotation and scene hues. In this particular setting ---object-content sets the object type as content, ---position-style sets the object position as style, ---rotation-style sets the object and spotlight rotation as style and ---hue-ms sets the scene hues as modality-specific information. By default, view 1 and view 2 content and style information follow uniform and normal distributions respectively. However, the default settings can be adapted by setting the ---continuous-marginal and ---continuous-conditional parameters to a normal or uniform distributions. In addition, the standard deviation of normal marginal and conditional distribution can be adjusted to custom settings by adjusting the ---normal-marginal-std,---normal-conditional-std (for style variables),---normal-conditional-noise (for content variables) respectively. In a similar fashion, the lower and upper bound of uniform distributions can be customized by djusting the --uniform-marginal-a (lower bound), --uniform-marginal-b (upper bound), --uniform-conditional-a (lower bound - for content variables), --uniform-conditional-b (upper bound - for content variables), --uniform-conditional-noise-a (lower bound - for style variables), --uniform-conditional-noise-b (upper bound - for style variables)

############# ADAPT THE FOLLOWING PARAMETERS ###############
OUTPUT_FOLDER="example"    # output folder for latent and image storage
SAMPLE_PAIRS=10            # number of sample pairs
NB_OBJECTS=2               # number of objects per image
############################################################

python generate_clevr_dataset_latents.py --output-folder ${OUTPUT_FOLDER} --n-pairs ${SAMPLE_PAIRS} --object --position --rotation --hue --object-content --position-style --rotation-style --hue-ms --n-object ${NB_OBJECTS} 

The following command renders images based on previously generated latents stored in ${OUTPUT_FOLDER}/m{1-2}/latents.npy. Images are rendered and stored in ${OUTPUT_FOLDER}/images.

############# ADAPT THE FOLLOWING PARAMETERS ###############
BLENDER_DIR=/example/blender                        # path to blender software
OUTPUT_FOLDER="example"                             # output folder for latent and image storage
LATENT_FOLDER="example_subfolder"                   # subfolder where latents are stored
MATERIAL="MyMetal"                                  # adjust: MyMetal, MyCristal, MyRubber
############################################################

N_BATCHES=10

for (( i=0; i<=$N_BATCHES; i++ ))
do
    ${BLENDER_DIR} -noaudio --background --python generate_clevr_dataset_images.py --use-gpu --output-folder ${OUTPUT_FOLDER}/${LATENT_FOLDER} --n-batches 10 --batch-index ${i} --material-names ${MATERIAL} --no_range_change
done

BibTeX


If you find our datasets useful, please cite our paper:

@article{daunhawer2023multimodal,
  author = {
    Daunhawer, Imant and
    Bizeul, Alice and
    Palumbo, Emanuele and
    Marx, Alexander and
    Vogt, Julia E.
  },
  title = {
    Identifiability Results for Multimodal Contrastive Learning
  },
  booktitle = {International Conference on Learning Representations},
  year = {2023}
}

Acknowledgements


This project builds on the following resources. Please cite them appropriately.