Skip to content

A unified benchmarking framework for generative styling models in PyTorch

License

Notifications You must be signed in to change notification settings

gojasper/style-rank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Style-Rank

Style Rank, a unified benchmarking framework for generative styling models in PyTorch. This repository contains code wrapping the implementations of several papers in the field of generative styling models and implementation of metrics to evaluate the quality of the generated images. We also provide Style-Rank, an evaluation dataset for comparison of the models.

This work was developed by Eyal Benaroche, Clément Chadebec, Onur Tasar, and Benjamin Aubin from Jasper Research and Ecole Polytechnique.

Grid

Models

Model Arxiv Code Project Page Notes
StyleAligned Arxiv Code Project Page
VisualStyle Arxiv Code Project Page
IP-Adapter Arxiv Code Project Page Using the implementation from Diffusers
InstantStyle Arxiv Code Project Page Using the implementation from Diffusers
CSGO Arxiv Code Project Page
Style-Shot Arxiv Code Project Page

Metrics

We implemented several common metrics to evaluate the quality of the generated images:

  • CLIP-Text metric : Cosine Similarity between a caption (embedded using ClipTextModel) and the generated image (embedded using ClipVisionModel) - Using the implementation from Transformers
  • CLIP-Image metric : Cosine Similarity between two images (embedded using ClipVisionModel) - Using the implementation from Transformers
  • Dino : Cosine Similarity between two images (embedded using Dinov2Model) - Using the implementation from Transformers
  • ImageReward : Score from the ImageReward model

Dataset

The dataset is an aggregation of images from multiple styling papers:

Setup

To be up and running, you need first to create a virtual env with at least python3.10 installed and activate it

With venv

python3.10 -m venv envs/style_rank
source envs/style_rank/bin/activate

With conda

conda create -n style_rank python=3.10
conda activate style_rank 

Install the dependencies

Then install the required dependencies (if on GPU) and the repo in editable mode

pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Usage

Using the provided code, you can generate stylized images on the provided datasets (or your own given the right format) and evaluate them using the provided metrics. Results can fluctuate as the generation is not seeded and the default prompts are sampled from a list of prompts.

Dataset

The dataset is formated to be used with WebDataset

You can download it locally

wget -O data/stylerank_papers.tar "https://huggingface.co/datasets/jasperai/style-rank/resolve/main/stylerank_papers.tar"

Or you can also stream it from HuggingFace with webdataset:

import webdataset as wds

url = f"pipe:curl -s -L https://huggingface.co/datasets/jasperai/style-rank/resolve/main/stylerank_papers.tar"
dataset = wds.WebDataset(url).decode('pil')
sample = next(iter(dataset))
sample["jpg"].show()

The dataset contains license, source, url, caption_blip, caption_cogvlm, style_caption and style_captionner metadata located as follows:

sample = {
    '__key__': image_key,
    'jpg': image_data,
    'json': {
        'license': image_license,
        'source': image_source,
        'url': original_dataset_url,
        'caption_blip': blip2_caption,
        'caption_cogvlm': cogvlam_caption,
        'style_caption': style_caption,
        'style_captionner': style_captioner,
    }
}

Inference

To generate images using one of the provided models, you can use the scripts provided in the examples/inference folder. For example, to generate images using the StyleAligned model, you can use the following command :

python examples/inference/stylealigned.py [--input-path /path/to/dataset] [--output-path /path/to/output]

Default output path is output/inference/ and the default input path is data/stylerank_papers.tar.

Addtionally, you can provide the --json_path argument to use a different json file for the prompts or use the --prompts argument to provide a list of prompts to use for the generation.

Iterating throught the provided .tar file and generate 4 random images based on the prompts provided in the prompts.json file, following a similar evaluation process as the one described in the VisualStyle paper.

Folder structure

The folder structure should be as follows :

.
├── README.md
├── data
│   ├── stylerank_papers.tar
│   └── prompts.json
├── examples
│   ├── inference
│   └── report
├── output
│   ├── inference
│   └── metrics
├── requirements.txt
├── setup.py
├── src
│   └── stylerank
└── tests
    ├── reference_images
    ├── test_metrics
    └── test_model

When running an inference script, the model will by default create a folder with its name to store the generated samples and the reference image using a new folder for each reference (with it's key as name) and the prompts used to generate it. The folder structure should look like this inside the ./output/ folder:

.
├── inference
│   ├── instant_style
│   │   ├── 0000
│   │   │   ├── prompt_1.png
│   │   │   ├── prompt_2.png
│   │   │   ├── prompt_3.png
│   │   │   ├── prompt_4.png
│   │   │   └── reference.png
│   │   ├── 0001
.   .   .   ....
│   │   └── 0111
│   ├── ip_adapter
│   │   ├── 0000
│   │   ├── 0001
.   .   .   ....
│   │   └── 0111
│   ├── stylealigned
.   .   └── ....
│   └── visualstyle
│       └── ....
└── metrics
    ├── interrupted.csv
    ├── report.csv
    └── metrics.csv

Reports

Given the generated image you can evaluate the results using the provided metrics. For example, to evaluate the generated images using the CLIP-Text metric, you can use the following command :

python examples/report/metrics.py --metrics ClipText [--input-path /path/to/dataset] [--output-path /path/to/output]

You can run multiple metrics at once by providing a list of metrics to the --metrics argument, ie :

python examples/report/metrics.py --metrics "[ClipText, ClipImage, Dinov2, ImageReward]" [--input-path /path/to/dataset] [--output-path /path/to/output]

It will output the results in the /path/to/output/metrics.csv file and the mean for each metric in the /path/to/output/report.csv file.

If you cancel the process, it will automatically save the results in the /path/to/output/interrupted.csv file.

Results

Running the evaluation on the provided stylerank_papers.tar dataset, we get the following results :

Model ImageReward ↑ Clip-Text ↑ Clip-Image ↑ Dinov2 ↑
StyleAligned -1.26 19.26 68.72 36.29
VisualStyle -0.72 22.12 66.68 20.80
IP-Adapter -2.03 15.01 83.66 40.50
Style-Shot -0.38 21.34 65.04 23.04
CSGO -0.29 22.16 61.73 16.85
InstantStyle -0.13 22.78 66.43 18.48
Inversion-InstantStyle -1.30 18.90 76.60 49.42

Results Clip-T vs Clip-I

Tests

To run the tests to make sure the models and metrics are working as expected, you need to install pytest and run the tests using the following command :

pip install pytest
pytest tests/

License

This code is released under the Creative Commons BY-NC 4.0 license.

Citation

If you find this work useful or use it in your research, please consider citing us

@misc{benaroche2024stylerank,
  title={Style-Rank: Benchmarking stylization for diffusion models}, 
  author=Eyal Benaroche and Clement Chadebec and Onur Tasar and Benjamin Aubin},
  year={2024},
  }

About

A unified benchmarking framework for generative styling models in PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages