Pixar-SD-Portraits

This repository is dedicated to transforming user-submitted portrait images into personalized cartoon characters that mimic the Pixar animation style using Stable Diffusion 1.5 or XL models.

Objective

The primary objective of this project is to develop a minimal, workable code solution capable of:

Taking user-submitted portrait images
Transforming them into personalized cartoon characters in the Pixar animation style, preserving key facial attributes and a recognizable degree of the individual's identity

Repository Contents

The repository features the following main components:

notebooks/: Notebooks that contain the same code as in the serve folder, but in a decomposed, easy-to-run format.
src/: Planned to store some abstractions around the pipeline in a more optimized way, but currently only contains a utils file.
assets/: Sample assets
serve/: Folder with two files:
- 00_sdxl_pixar_lora_demo.py: Runs the SDXL demo with face IP-Adapter + Pixar LoRA that I took from civitAI. You can find my fork here: Pixar-SDXL-LoRA
- 01_sd1.5_pixar_demo.py: Same as 00, but uses the SD1.5 finetuned checkpoint on Pixar images. It looks much better than the SDXL variant. I recommend running this demo.

Environment Setup

To begin experimenting with Pixar-style portrait generation, set up your environment by following these steps:

Create a new conda environment:
```
conda create --name pixarsd python=3.10
```
Activate the environment:
```
conda activate pixarsd
```
Install Python dependencies:
```
pip install -r requirements.txt
```
Run the gradio demo:
```
python serve/01_sd1.5_pixar_demo.py
```

Alternatively, you can run the Jupyter notebook in the notebooks folder:

jupyter lab --no-browser --ip 0.0.0.0 --port 8888 --allow-root --notebook-dir=.

Approach

The approach is pretty classic at this stage. I used IP-Adapter, SD model, and Pixar LoRA (Low-Rank Adaptation).

In both pipelines, I first take the user's picture, retrieve the face region using the insightface buffalo_l model, then crop the face and pass it as a condition for the IP-Adapter face SD model in an image-to-image style.

Next Steps for Improvements

This is a POC right now, so it has a lot of directions to improve:

Generation Speed: I've tested both pipelines on an NVIDIA A100 GPU. It took around 10 seconds to generate 4 images at 1024x1024 resolution. There are modern ways to compile torch models, using for example Stable-Fast or NVIDIA TensorRT. These frameworks could speed up diffusion models up to 2 times! Also, activation caching could help.
Model Finetuning: To improve results, it makes sense to collect a dataset and finetune the model in an image-to-image setting. For example, we could generate more pairs (real photo - styled in Pixar) and train a more narrow, smaller, quantized model to generate such images.
Experiment with Other Approaches: IP-Adapter is only one approach among many. There are also ConsistentID, InstantID, Photomaker, FastComposer, FaceAdapter, and many others. It makes sense to experiment with each, as they could return more consistent images with stable identity preservation due to training on a larger set with the same use case as ours (avatar creation).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pixar-SD-Portraits

Table of Contents

Objective

Repository Contents

Environment Setup

Approach

Next Steps for Improvements

Results

SD1.5

SDXL

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
notebooks		notebooks
serve		serve
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

nerlfield/pixar-sd-portraits

Folders and files

Latest commit

History

Repository files navigation

Pixar-SD-Portraits

Table of Contents

Objective

Repository Contents

Environment Setup

Approach

Next Steps for Improvements

Results

SD1.5

SDXL

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages