In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
Official PyTorch implementation of the CVPR 2024 paper
In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
Yiran Xu, Zhixin Shu, Cameron Smith, Seoung Wug Oh, and Jia-Bin Huang
https://in-n-out-3d.github.io/
Abstract: 3D-aware GANs offer new capabilities for view synthesis while preserving the editing functionalities of their 2D counterparts. GAN inversion is a crucial step that seeks the latent code to reconstruct input images or videos, subsequently enabling diverse editing tasks through manipulation of this latent code. However, a model pre-trained on a particular dataset (e.g., FFHQ) often has difficulty reconstructing images with out-of-distribution (OOD) objects such as faces with heavy make-up or occluding objects. We address this issue by explicitly modeling OOD objects from the input in 3D-aware GANs. Our core idea is to represent the image using two individual neural radiance fields: one for the in-distribution content and the other for the out-of-distribution object. The final reconstruction is achieved by optimizing the composition of these two radiance fields with carefully designed regularization. We demonstrate that our explicit decomposition alleviates the inherent trade-off between reconstruction fidelity and editability. We evaluate reconstruction accuracy and editability of our method on challenging real face images and videos and showcase favorable results against other baselines.
- We recommend Linux for performance and compatibility reasons.
- The code is built upon NVIDIA's eg3d repo.
- 64-bit Python 3.8 and PyTorch 1.11.0 (or later). See https://pytorch.org for PyTorch install instructions. We tested our code on Python 3.9 and PyTorch 1.12.1.
- Python libraries: see requirements.txt for library dependencies.
- Set up environment with
conda
:
conda create -n in-n-out python=3.9
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
Please download a pre-trained EG3D checkpoint, put it at ./eg3d/pretrained_models
.
mkdir -p eg3d/pretrained_models
wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/research/eg3d/1/files?redirect=true&path=ffhqrebalanced512-128.pkl' -O ./eg3d/pretrained_models/ffhqrebalanced512-128.pkl
To test our code, we provide a pre-trained checkpoint here.
Please download the checkpoint and place it at eg3d/ckpts
.
Please download the data and unzip it at eg3d/data/wildvideos
.
We also provide all StyleCLIP checkpoints here. Please download them and unzip them at eg3d/CLIPStyle/mapper_results
. (e.g., unzip mapper_results.zip -d ./eg3d/CLIPStyle
)
To edit a video, as an example, run
cd eg3d
bash scripts/run_test_styleclip.sh rednose2 eyeglasses ckpts/rednose2
Don't worry about the Missing key(s)
error as eyeglasses
mapper has no fine mapper.
The results will be saved at eg3d/results/rednose2
.
- Processed data. We provide a dataset of preprocessed data. Please download it and put it at
eg3d/data/wildvideos
- Your own data. This includes human face alignment and will use part of the code from official EG3D repo.
First, follow EG3D's instructions on setting up
Deep3DFaceRecon_pytorch
.
cd data_preprocessing/ffhq/
git clone https://github.com/sicxu/Deep3DFaceRecon_pytorch.git
Install Deep3DFaceRecon_pytorch
following the their instructions.
Also make sure you have their checkpoint file epoch_20.pth
and place it at data_preprocessing/ffhq/Deep3DFaceRecon_pytorch/checkpoints/pretrained/epoch_20.pth
.
We provide a scrip batch_preprocess_in_the_wild.sh
to preprocess your own data of human faces.
The script accepts following folder tree (either a video or an image):
InputRoot
├── VideoName1
│ ├── frame1
│ ├── frame2
...
│ ├── frameN
└── ImageName1
└── image1
...
Run
bash batch_preprocess_in_the_wild.sh ${InputRoot} ${OutputRoot} ${VideoName}
bash batch_preprocess_in_the_wild.sh ${InputRoot} ${OutputRoot} ${ImageName}
To train our model on a video, as an example, run
cd eg3d
bash scripts/run_train.sh rednose2 train
The results will be saved at ckpts/rednose2/train
.
To run your data, run
bash scripts/run_train.sh ${videoname} ${expname}
# change to eg3d
cd eg3d
# Here we try to use the pre-trained checkpoint. Suppose it has been placed at ./ckpts/rednose2/
# Remove the OOD object.
python outdomain/test_outdomain.py --remove_ood=true --smooth_out=true --network=pretrained_models/ffhqrebalanced512-128.pkl --ckpt_path=./ckpts/rednose2/triplanes.pt --target_path Path-to-rednose2 --latents_path ./ckpts/rednose2/triplanes.pt --outdir ./results/rednose2/eval/ood_removal_smoothed
# Please replace `Path-to-rednose2` with your own path.
# Save it as a video.
python frames2vid.py --frames_path ./results/rednose2/eval/ood_removal_smoothed/frames/projected_sr --output_dir ./results/rednose2/eval/ood_removal_smoothed/frames/projected_sr.mp4
- EG3D, Chan et al. 2022
- Dynamic NeRF, Gao et al. 2021
@inproceedings{Xu2024inNout,
author = {Xu, Yiran and Shu, Zhixin and Smith Cameron and Oh, Seoung Wug and Huang, Jia-Bin},
title = {In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editings},
booktitle = {CVPR},
year = {2024}
}
This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.
data_preprocessing/ffhq/3dface2idr_mat.py
, data_preprocessing/ffhq/batch_preprocess_in_the_wild.sh
, data_preprocessing/ffhq/draw_images_in_the_wild.py
, data_preprocessing/ffhq/smooth_video_lms.py
, data_preprocessing/ffhq/landmark68_5.py
,eg3d/outdomain/*
, eg3d/inversion/*
, eg3d/frames2vid.py
, eg3d/gen_3d_rgb.py
, eg3d/vid2frames.py
, eg3d/scripts/*
, eg3d/frames2vid.py
, eg3d/vid2frames.py
, w_avg.pt
, and other materials including the model checkpoints and shell scripts are licensed under the CC BY-NC.
Files at eg3d/CLIPStyle/*
are from StyleCLIP.
Files at eg3d/configs/*
, eg3d/criteria
are from PTI.
Other files at dataset_preprocessing
, eg3d/dnnlib
, eg3d/gui_utils
, eg3d/torch_utils
, eg3d/training
, and eg3d/camera_utils.py
, eg3d/cammat2json.py
, eg3d/gen_3d_rgb.py
, eg3d/gen_samples.py
, eg3d/gen_videos.py
, eg3d/legacy.py
, are licensed from NVIDIA license.
Some images were from Upsplash under the standard Unsplash license.