This is a companion repository to traiNNer, in order to more easily produce results with models trained with it.
Currently, the model architectures supported are for: Super-Resolution, Restoration (denoise, deblur) and image to image translation. Support for the remaining architectures (SRFlow, Video, etc) is planned.
Below is a (non-comprehensive) list of features currently available in the project. More are planned (see below).
- Support for all super-resolution and restoration models available in traiNNer, like:
ESRGAN
(RRDB
, both original and modified architectures),SRGAN
(SRResNet
),PPON
, andPAN
. (SRFlow
is pending). - "Chop forward" option, to automatically divide large images to smaller crops to prevent
CUDA
errors due to exhaustedVRAM
. - Automatic inference of model scale, either from the model name (for example:
4x_PPON_CGP4.pth
will be interpreted as scale4
) or from the network configuration (currently only forESRGAN
andSRGAN
, others planned). If the scale cannot be infered, you can use the scale flag with the scale factor, like:-scale 4
. - Automatic inference of model architecture (for super-resolution and restoration at the moment, others planned). Meaning that, for example,
ESRGAN
,PPON
andPAN
models can be chained with no additional requirement. The exact architecture can also be provided, specially if not using a default network configuration. - Model chaining, to pass the input images through a sequence of models.
- Direct support for Stochastic Weight Averaging (SWA) models. Will automatically be converted to a regular model.
- Support for image to image translation models:
pix2pix
(UNet
),CycleGAN
(ResNet
)wbc
(WBCUnet
). - Support for
TorchScript
models. Use flag-arch ts
(can't be chained yet). - Automatic color correction, for models that modify the color hues when applied. Use flag
-cf
. - Use of
fp16
format to reduce memory requirements. Technically, this operates with less accuracy than the defaultfp32
, but for all tests so far, the errors were imperceptible. If you suspect any issue, fp16 can be disabled with the-no_fp16
flag. - Runs on NVidia GPUs if available by default or on CPU. Can also force to CPU with the flag
-cpu
. - Partial model name support. You don't need to use the models' full names, only part of the name that identifies it as different from others.
- Option to do a side by side comparison of the input images to the output results with the
-comp
flag.
- On the fly model interpolation, with automatic model compatibility detection. (Except for TorchScript models)
- Option to use the Consistency Enforcement Module (CEM).
- Additional color correction alternatives.
- Photograph restoration pipeline.
- Add Colab Notebook.
You need to provide a directory where the input
images to be processed are located and an output
where the results will be saved. By default, these directories will be ./input/
and ./output/
respectively, but you can modify those with the -input
and -output
flags.
If you obtain a trained model, either the original from a paper or from the model database, you can place it in the ./models/
directory.
As an example, if you want to use the Fatality
model from the database, you will download the model (4x_Fatality_01_265000_G.pth
) and move it to ./models/
.
Once there and with the input images ready, you can run obtain the results simply by running:
python run.py -m fatal
And the results will be saved in ./output/
.
To chain multiple models, you need to provide a sequence of model names to the -m
flag. For example, to first remove JPEG artifacts and then upscale images, you can fetch one of the JPEG denoising models from the database (Example: 1x_JPEG_60-80.pth
) and an upscaling model (Example: 4x_Fatality_01_265000_G.pth
) and use a plus sign (+
) between their names.
python run.py -m jpeg+fatal
Note that there's technically no limit to how many models can be chained, but if the models are for upscaling, image sizes can become impossible to manage in memory. This is mostly a hardware limitation. You can also chain the same model multiple times to the images, which can produce interesting results in some cases.
For these cases, for now you'll need to provide the network architecture used to train the model. For example, from the trained model available for pix2pix
and CycleGAN
, that will correspond to unet_256
(or p2p_256
) and resnet_9blocks
(or cg_9
) for the CycleGAN
case.
For example, to try out the label2facade
model (facades_label2photo.pth
), you need to run:
python run.py -m facade -a p2p_256
This will produce a single result:
For a side by side comparison between input and output, add the -comp
flag:
python run.py -m facade -a p2p_256 -comp
Similarly, to test the ukiyoe
CycleGAN model (either photo2ukiyoe.pth
or style_ukiyoe.pth
), with a comparison run:
python run.py -m ukiyoe -a cg_9 -comp
For WBC, a special case is available, where the original TensorFlow model converted to PyTorch and available in the pretrained options can be used and produces the same results shown in the original repo, converting photos to anime cartoon style.
The models trained with PyTorch can also be used (here using wbc.pth
):
And if different models are trained with different representations scales, the resulting models can be interpolated to obtain intermediate results between two of them. For now this can be done with a simple script, but later this can be done on the fly by iNNfer (TBD). More information about interpolating models can be found here
You can also tweak the Guided Filter component in run.py
(search for the note
), and if the r
is increased, the final output details can be reduced, depending on the expected results. More details about the guided filter are available in the original paper.
If the models are named wbc*
, the wbcunet
architecture and configuration will be automatically selected, otherwise add the -arch wbcunet
flag when running.
TorchScript models are directly supported, just be aware that these models need to be run in the same fashion they were traced. For example, if the option for using GPUs was used when they were created, they will only be able to run in NVidia GPUs with CUDA support. Here you will find a number of models from the model database that were already converted to TorchScript (using GPU, CPU versions can be made available if needed) and are ready to use.
For example, to use the 4xRealSR_DF2K_JPEG.pt
model, just execute:
python run.py -m realsr
One of the advantages these TorchScript models have is that they no longer require explicit support of the network architecture, so you could use any model of any architecture that has been converted to TorchScript with this code, even if the architecture is not supported. This is useful if you need to use other features like the color correction.
Some models introduce color changes that may not be desired. For that reason, there are options that can be used to correct those changes.
Using an example image from the Manga109 set, with a model that intentionally introduces heavy color changes, run:
python run.py -m shin -comp
And this produces this result:
To try to fix the colors, only add the color fix flag -cf
, like:
python run.py -m shin -comp -cf
And you will obtain a version of the upscale that more closely matches the colors of the original image:
This flag will work, even if multiple models are chained.
There are multiple ways to help this project. The first one is by using it and trying to produce results with your models. You can open an issue if you find any bugs or if you have ideas or questions.
If you would like to contribute in the form of adding or fixing code, you can do so be cloning this repo and creating a PR.
You can also join the discord servers and share results and questions with other users.
Lastly, after it has been suggested many times before, now there are options to donate to show your support to the project and help stir it in directions that will make it even more useful. Below you will find those options that were suggested.
Bitcoin Address: 1JyWsAu7aVz5ZeQHsWCBmRuScjNhCEJuVL
Ethereum Address: 0xa26AAb3367D34457401Af3A5A0304d6CbE6529A2
If you have any questions, we have a couple of discord servers (game upscale and animation upscale) where you can ask them and a Wiki with more information.