This is the official implementation of VOODOO 3D: a high-fidelity 3D-aware one-shot head reenactment technique. Our method transfers the expression of a driver to a source and produces view consistent renderings for holographic displays.
For more details of the method and experimental results of the project, please checkout our paper, youtube video, or the project page.
First, clone the project:
git clone https://github.com/MBZUAI-Metaverse/VOODOO3D-official
The implementation only requires standard libraries. You can install all the dependencies using conda and pip:
conda create -n voodoo3d python=3.10 pytorch=2.3.0 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
Next, prepare the pretrained weights and put them into ./pretrained_models
:
- Foreground Extractor: Donwload weights provided by MODNet using this link
- Pose estimation: Download weights provided by Deep3DFaceRecon_pytorch using this link
- Our pretrained weights
Use the following command to test the model:
python test_voodoo3d.py --source_root <IMAGE_FOLDERS / IMAGE_PATH> \
--driver_root <IMAGE_FOLDERS / IMAGE_PATH> \
--config_path configs/voodoo3d.yml \
--model_path pretrained_models/voodoo3d.pth \
--save_root <SAVE_ROOT> \
Where source_root
and driver_root
are either image folders or image paths of the sources and drivers respectively. save_root
is the folder root that you want to save the results. This script will generate pairwise reenactment results of the sources and drivers in the input folders / paths. For example, to test with our provided images:
python test_voodoo3d.py --source_root resources/images/sources \
--driver_root resources/images/drivers \
--config_path configs/voodoo3d.yml \
--model_path pretrained_models/voodoo3d.pth \
--save_root results/voodoo3d_test \
Lp3D is the state-of-the-art 3D Portrait Reconstruction model. As mentioned in the VOODOO 3D paper, we had a reimplementation of this model but fine-tuned on in-the-wild data. To evaluate this model, use the following script:
python test_lp3d.py --source_root <IMAGE_FOLDERS / IMAGE_PATH> \
--config_path configs/lp3d.yml \
--model_path pretrained_models/voodoo3d.pth \
--save_root <SAVE_ROOT> \
--cam_batch_size <BATCH_SIZE>
where source_root
is either an image folder or an image path of the images that will be reconstructed in 3D. SAVE_ROOT
is the destination of the results. BATCH_SIZE
is the testing batch size (the higher, the faster). For each image in the input folder, the model will generate a rendered video of its corresponding 3D head using a fixed camera trajectory. Here is an example using our provided images:
python test_lp3d.py --source_root resources/images/sources \
--config_path configs/lp3d.yml \
--model_path pretrained_models/voodoo3d.pth \
--save_root results/lp3d_test \
--cam_batch_size 2
Our implementation uses modified versions of other projects that has different licenses. Specifically:
- GPFGAN and MODNet, is distributed under Apache License version 2.0.
- EG3D and SegFormer is distributed under NVIDIA Source Code License.
Other code if not stated otherwise is licensed under the MIT License. See the LICENSES file for details.
This work would not be possible without the following projects:
- eg3d: We used portions of the data preprocessing and the generative model code to synthesize the data during training.
- Deep3DFaceRecon_pytorch: We used portions of this code to predict the camera pose and process the data.
- segmentation_models.pytorch: We used portions of DeepLabV3 implementation from this project.
- MODNet: We used portions of the foreground extraction code from this project.
- SegFormer: We used portions of the transformer blocks from this project.
- GFPGAN: We used portions of GFPGAN as our super-resolution module
If you see your code used in this implementation but haven't properly acknowledged, please contact me via [email protected].
If our code is useful for your research or application, please cite our paper:
@inproceedings{tran2023voodoo,
title = {VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment},
author = {Tran, Phong and Zakharov, Egor and Ho, Long-Nhat and Tran, Anh Tuan and Hu, Liwen and Li, Hao},
year = 2024,
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}
}
For any questions or issues, please open an issue or contact [email protected].