Run python main.py +experiment=dit
for experiments using DiT as the diffusion backbone. Here are the arguments which you can adapt for different datasets and hyper-parameters:
- Classifier backbone: set
model.class_arch
toconvnext_large
/convnext_tiny
/resnet18
/vit_b_32
/vit_b_16
/vit_l_14
for ImageNet-trained classifiers - Larger DiT: use
+experiment=dit
and setinput.sd_img_res=512
- Optimizer: set
tta.gradient_descent.optimizer
toadam
/sgd
- Learning reate: set
tta.gradient_descent.base_learning_rate
to any numerical values - Dataset: set
input.dataset_name
toImageNetDataset
/ImageNetv2Dataset
/ImageNetRDataset
/ImageNetCDataset
/ImageNetADataset
/ImageNetStyleDataset
. - Total batch size: set
input.batch_size
andtta.gradient_descent.accum_iter
where the total batch size is the multiplication of these two parameters
Empirically, we found that using larger total batch size results in more stable classification improvement. However, it takes longer time for TTA with larger batch size. Also, we found that some backbones are better with sgd
optimizer than adam
optimizer.
Single Sample TTA on FGVC-Aircraft and other datasets
``` python main.py +experiment=sd model.class_arch=clipb32 input.dataset_name=FGVCAircraftSubset ```ConvNext-Tiny works better with adam
optimizer
Online TTA on ImageNet-C
python main.py +experiment=dit model.class_arch=convnext_large input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetCDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam tta.online=True input.subsample=1 log_freq=1
ConvNext-Tiny works better with adam
optimizer
Single-sample TTA on ImageNet-R
python main.py +experiment=dit model.class_arch=convnext_tiny input.batch_size=12 tta.gradient_descent.accum_iter=15 input.dataset_name=ImageNetRDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
Single-sample TTA on ImageNet-C
python main.py +experiment=dit model.class_arch=convnext_tiny input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetCDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam input.subsample=null
Single-sample TTA on ImageNet-A
python main.py +experiment=dit model.class_arch=convnext_tiny input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetADataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam input.subsample=null
Single-sample TTA on ImageNet-v2
python main.py +experiment=dit model.class_arch=convnext_tiny input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetv2Dataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
Single-sample TTA on ImageNet
python main.py +experiment=dit model.class_arch=convnext_tiny input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
ResNet-18 works better with adam
optimizer
Single-sample TTA on ImageNet-R
python main.py +experiment=dit model.class_arch=resnet18 input.batch_size=12 tta.gradient_descent.accum_iter=15 input.dataset_name=ImageNetRDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
Single-sample TTA on ImageNet-C
python main.py +experiment=dit model.class_arch=resnet18 input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetCDataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd
Single-sample TTA on ImageNet-A
python main.py +experiment=dit model.class_arch=resnet18 input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetADataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam input.subsample=null
Single-sample TTA on ImageNet-v2
python main.py +experiment=dit model.class_arch=resnet18 input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetv2Dataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
Single-sample TTA on ImageNet
python main.py +experiment=dit model.class_arch=resnet18 input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
ViT-B-32 works better with sgd
optimizer
Single-sample TTA on ImageNet-R
python main.py +experiment=dit model.class_arch=vit_b_32 input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetRDataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd
Single-sample TTA on ImageNet-C
* ImageNet-C ``` python main.py +experiment=dit model.class_arch=vit_b_32 input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetCDataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd ```Single-sample TTA on ImageNet-A
python main.py +experiment=dit model.class_arch=vit_b_32 input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetADataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd input.subsample=null
Single-sample TTA on ImageNet-v2
python main.py +experiment=dit model.class_arch=vit_b_32 input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetv2Dataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd
Single-sample TTA on ImageNet
python main.py +experiment=dit model.class_arch=vit_b_32 input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetDataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd