Serpent is a novel architecture for efficient image restoration that leverages state space models capable of modeling intricate long-range dependencies in high-resolution images with a favorable linear scaling in input dimension.
To install the requirements, you can use:
pip install -r requirements.txt
If you have difficulties installing mamba_ssm
, please follow the instructions in its GitHub repository.
You should also install selective scan kernels from VMamba GitHub repository. To do so, simply clone the repository and run the following command:
cd Vmamba/kernels/selective_scan
pip install .
Also, please check the data config file and put the correct addresses for the datasets. For FFHQ, the final folder of the address should be images256x256
, and the code will replace the two 256s with the desired resolution automatically (you should not change it).
To train a model, use the following:
python3 scripts/training.py --config_path CONFIG_PATH
Here, CONFIG_PATH
is the address of a config file. You can find our training config files in the training config folder. You can also use the following arguments in order to control the training:
devices
: number of GPUs.save_checkpoints
: boolean indicating to save checkpoints.logger_type
: tensorboard (tb
) or weights and biases (wandb
).batch_size
: batch size of the training.max_epochs
: maximum number of epochs (default is 60).noise_power
: the power of the noise if you are applying a noisy degradation to the images.
In the training config file, you should specify the following:
model
: name of the model based on the available architecture in here.exp_name
: The name of the experiment (this is arbitrary and optional).hyperparameters
: Training hyperparameters like learning rate, weight decay, etc.data
: Specifications of the dataset, includingdata
(name of the dataset),degradation_config
, andimage_size
. For training, available datasets are FFHQ (x256, x512, x1024), CelebA, and GOPRO. For the GOPRO dataset, we ignoredegradation_config
as we already have the degraded images.data_more
: Other configs like using random crops, probability of flipping the images, list of epochs, image sizes, and batch sizes for each stage of training (for progressive learning), etc. Note that progressive learning is incompatible with SwinIR models.
To evaluate a model, run the this command:
python scripts/evaluation.py --config_path CONFIG_PATH
Here, CONFIG_PATH
is the address of a config file. You can find our evaluation config files in the evaluation config folder. You can also use the following arguments in order to control the evaluation:
devices
: number of GPUs.logger_type
: tensorboard (tb
) or weights and biases (wandb
).batch_size
: batch size of the training.
In the training config file, you should specify the following:
model
: name of the model based on the available architecture in here.exp_name
: The name of the experiment (this is arbitrary and optional).ckpt
: Path to the weights of the model.data
: Specifications of the dataset, includingdata
(name of the dataset),degradation_config
, andimage_size
. For training, available datasets are FFHQ (x256, x512, x1024), CelebA, GOPRO, HIDE, RealBlur-J, and RealBlur-R. For the GOPRO dataset, we ignoredegradation_config
as we already have the degraded images.noise_power
: the power of the noise if you are applying a noisy degradation to the images.
You can download the checkpoints of our trained models from here. To use these models, you can run the following command:
python scripts/run_model.py --config_path CONFIG_PATH
Here, CONFIG_PATH
is the address of a config file. You can find an example of config files in here. In the config file, you should specify your data folder, the model, and exp_name
. The code will save the inputs and predictions in ./Results/exp_name
. By default, the code assumes that the images are degraded, but if you want to apply a specific degradation, add it to the config file and use the following command:
python scripts/run_model.py --config_path CONFIG_PATH --apply_transform
Results for x512 Gaussian deblurring:
Method | PSNR (↑) | SSIM (↑) | LPIPS (↓) | GFLOPS | GPU mem. (GB) |
---|---|---|---|---|---|
Restormer | 28.51 | 0.7797 | 0.4136 | 161.9 | 28.5 |
SwinIR-B | 28.37 | 0.7756 | 0.4214 | 889.5 | 51.8 |
U-Net (32) | 28.23 | 0.7710 | 0.4162 | 48.6 | 3.7 |
U-Net (128) | 28.35 | 0.7751 | 0.4138 | 771.1 | 7.8 |
Serpent-B (ours) | 28.35 | 0.7755 | 0.4195 | 5.6 | 4.1 |
Serpent-L (ours) | 28.48 | 0.7790 | 0.4136 | 22.2 | 6.4 |
Serpent-H (ours) | 28.51 | 0.7800 | 0.4127 | 88.8 | 15.9 |
If you use Serpent in a research paper, please cite our paper:
@article{sepehri2024serpent,
title={Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models},
author={Sepehri, Mohammad Shahab and Fabian, Zalan and Soltanolkotabi, Mahdi},
journal={NGSM Workshop at ICML},
year={2024}
}
If you have any questions, feel free to open a Discussion and ask your question or send an email to [email protected] (Mohammad Shahab Sepehri).