[CVPR 2024] VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment

Overview

This is the official implementation of VOODOO 3D: a high-fidelity 3D-aware one-shot head reenactment technique. Our method transfers the expression of a driver to a source and produces view consistent renderings for holographic displays.

For more details of the method and experimental results of the project, please checkout our paper, youtube video, or the project page.

Installation

First, clone the project:

git clone https://github.com/MBZUAI-Metaverse/VOODOO3D-official

The implementation only requires standard libraries. You can install all the dependencies using conda and pip:

conda create -n voodoo3d python=3.10 pytorch=2.3.0 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

pip install -r requirements.txt

Next, prepare the pretrained weights and put them into ./pretrained_models:

Foreground Extractor: Donwload weights provided by MODNet using this link
Pose estimation: Download weights provided by Deep3DFaceRecon_pytorch using this link
Our pretrained weights

Inference

3D Head Reenactment

Use the following command to test the model:

python test_voodoo3d.py --source_root <IMAGE_FOLDERS / IMAGE_PATH> \
                    --driver_root <IMAGE_FOLDERS / IMAGE_PATH> \
                    --config_path configs/voodoo3d.yml \
                    --model_path pretrained_models/voodoo3d.pth \
                    --save_root <SAVE_ROOT> \

Where source_root and driver_root are either image folders or image paths of the sources and drivers respectively. save_root is the folder root that you want to save the results. This script will generate pairwise reenactment results of the sources and drivers in the input folders / paths. For example, to test with our provided images:

python test_voodoo3d.py --source_root resources/images/sources \
                    --driver_root resources/images/drivers \
                    --config_path configs/voodoo3d.yml \
                    --model_path pretrained_models/voodoo3d.pth \
                    --save_root results/voodoo3d_test \

Fine-tuned Lp3D for 3D Reconstruction

Lp3D is the state-of-the-art 3D Portrait Reconstruction model. As mentioned in the VOODOO 3D paper, we had a reimplementation of this model but fine-tuned on in-the-wild data. To evaluate this model, use the following script:

python test_lp3d.py --source_root <IMAGE_FOLDERS / IMAGE_PATH> \
                    --config_path configs/lp3d.yml \
                    --model_path pretrained_models/voodoo3d.pth \
                    --save_root <SAVE_ROOT> \
                    --cam_batch_size <BATCH_SIZE>

where source_root is either an image folder or an image path of the images that will be reconstructed in 3D. SAVE_ROOT is the destination of the results. BATCH_SIZE is the testing batch size (the higher, the faster). For each image in the input folder, the model will generate a rendered video of its corresponding 3D head using a fixed camera trajectory. Here is an example using our provided images:

python test_lp3d.py --source_root resources/images/sources \
                    --config_path configs/lp3d.yml \
                    --model_path pretrained_models/voodoo3d.pth \
                    --save_root results/lp3d_test \
                    --cam_batch_size 2

License

Our implementation uses modified versions of other projects that has different licenses. Specifically:

GPFGAN and MODNet, is distributed under Apache License version 2.0.
EG3D and SegFormer is distributed under NVIDIA Source Code License.

Other code if not stated otherwise is licensed under the MIT License. See the LICENSES file for details.

Acknowledgements

This work would not be possible without the following projects:

eg3d: We used portions of the data preprocessing and the generative model code to synthesize the data during training.
Deep3DFaceRecon_pytorch: We used portions of this code to predict the camera pose and process the data.
segmentation_models.pytorch: We used portions of DeepLabV3 implementation from this project.
MODNet: We used portions of the foreground extraction code from this project.
SegFormer: We used portions of the transformer blocks from this project.
GFPGAN: We used portions of GFPGAN as our super-resolution module

If you see your code used in this implementation but haven't properly acknowledged, please contact me via [email protected].

BibTeX

If our code is useful for your research or application, please cite our paper:

@inproceedings{tran2023voodoo,
	title = {VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment},
	author = {Tran, Phong and Zakharov, Egor and Ho, Long-Nhat and Tran, Anh Tuan and Hu, Liwen and Li, Hao},
	year = 2024,
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}
}

Contact

For any questions or issues, please open an issue or contact [email protected].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR 2024] VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment

Overview

Installation

Inference

3D Head Reenactment

Fine-tuned Lp3D for 3D Reconstruction

License

Acknowledgements

BibTeX

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSES		LICENSES
additional_modules		additional_modules
configs		configs
data_preprocessing		data_preprocessing
dnnlib		dnnlib
models		models
pretrained_models		pretrained_models
rendering		rendering
resources		resources
torch_utils		torch_utils
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test.sh		test.sh
test_lp3d.py		test_lp3d.py
test_voodoo3d.py		test_voodoo3d.py

License

mbzuai-metaverse/VOODOO3D-official

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2024] VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment

Overview

Installation

Inference

3D Head Reenactment

Fine-tuned Lp3D for 3D Reconstruction

License

Acknowledgements

BibTeX

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages