This is the official repository of the ECCV 2024 paper "ScanTalk: 3D Talking Heads from Unregistered Scans" by Federico Nocentini, Thomas Besnier, Claudio Ferrari, Sylvain Arguillere, Stefano Berretti, Mohamed Daoudi.
🔥🔥 [2024/09/10] Our code is now public available! Feel free to explore, use, and contribute!
🔥🔥 [2024/10/25] An extension of ScanTalk: "Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads" is available on arxiv.
Speech-driven 3D talking heads generation has emerged as a significant area of interest among researchers, presenting numerous challenges. Existing methods are constrained by animating faces with fixed topologies, wherein point-wise correspondence is established, and the number and order of points remains consistent across all identities the model can animate. In this work, we present ScanTalk, a novel framework capable of animating 3D faces in arbitrary topologies including scanned data. Our approach relies on the DiffusionNet architecture to overcome the fixed topology constraint, offering promising avenues for more flexible and realistic 3D animations. By leveraging the power of DiffusionNet, ScanTalk not only adapts to diverse facial structures but also maintains fidelity when dealing with scanned data, thereby enhancing the authenticity and versatility of generated 3D talking heads. Through comprehensive comparisons with state-of-the-art methods, we validate the efficacy of our approach, demonstrating its capacity to generate realistic talking heads comparable to existing techniques. While our primary objective is to develop a generic method free from topological constraints, all state-of-the-art methodologies are bound by such limitations.
We present ScanTalk, a deep learning architecture to animate any 3D face mesh driven by a speech. ScanTalk is robust enough to learn on multiple unrelated datasets with a unique model, whilst allowing us to infer on unregistered face meshes.
ScanTalk is a novel Encoder-Decoder framework designed to dynamically animate any 3D face based on a spoken sentence from an audio file. The Encoder integrates the 3D neutral face
@inproceedings{nocentini2024scantalk3dtalkingheads,
title = {ScanTalk: 3D Talking Heads from Unregistered Scans},
author = {Nocentini, F. and Besnier, T. and Ferrari, C. and Arguillere, S. and Berretti, S. and Daoudi, M.},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
}
ScanTalk Installation Guide
This guide provides step-by-step instructions on how to set up the ScanTalk environment and install all necessary dependencies. The codebase has been tested on Ubuntu 20.04.2 LTS with Python 3.8.
1. Setting Up Conda Environment
It is recommended to use a Conda environment for this setup.
-
Create a Conda Environment
conda create -n scantalk python=3.8.18
-
Activate the Environment
conda activate scantalk
2. Install Mesh Processing Libraries
-
Clone the MPI-IS Repository
git clone https://github.com/MPI-IS/mesh.git
cd mesh
-
Modify line 7 of the Makefile to avoid error
@pip install --no-deps --config-settings="--boost-location=$$BOOST_INCLUDE_DIRS" --verbose --no-cache-dir .
-
Run the MakeFile
make all
2. Installing PyTorch and Requirements
Ensure you have the correct version of PyTorch and torchvision. If you need a different CUDA version, please refer to the official PyTorch website.
-
Install PyTorch, torchvision, and torchaudio
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
-
Install Requirements
pip install -r requirements.txt
For training and testing ScanTalk, we utilized three open-source datasets for 3D Talking Heads: vocaset, BIWI, and Multiface. The elaborated and aligned datasets, all standardized to the vocaset format, used for both training and testing ScanTalk, can be found here. After downloading, place the Dataset
folder in the main directory.
We are releasing two versions of ScanTalk: one named scantalk_mse.pth.tar
, trained using Mean Square Error Loss, and another named scantalk_mse_masked_velocity.pth.tar
, which is trained with a combination of multiple loss functions. Both models are available for download here. After downloading, place the results
folder within the src
directory.
The files scantalk_train.py
and scantalk_test.py
are used for training and testing, respectively. scantalk_test.py
generates a directory containing all the ScanTalk predictions for each test set in the datasets. After obtaining the predictions, compute_metrics.py
is used to calculate evaluation metrics by comparing the ground truth with the model's predictions.
You can use demo.py
to run a demo of ScanTalk, animating any 3D face that has been aligned with the training set. Both audio and 3D face for the demo are in the src/examples
folder.
- Federico Nocentini*
- Thomas Besnier*
- Claudio Ferrari
- Sylvain Arguillere
- Stefano Berretti
- Mohamed Daoudi
* Equal contribution.
This work is supported by the ANR project Human4D (ANR-19-CE23-0020) and by the IRP CNRS project GeoGen3DHuman. It was also partially supported by "Partenariato FAIR (Future Artificial Intelligence Research) - PE00000013, CUP J33C22002830006", funded by NextGenerationEU through the Italian MUR within the NRRP, project DL-MIG. Additionally, this work was partially funded by the ministerial decree n.352 of the 9th April 2022, NextGenerationEU through the Italian MUR within NRRP, and partially supported by Fédération de Recherche Mathématique des Hauts-de-France (FMHF, FR2037 du CNRS).
All material is made available under Creative Commons BY-NC 4.0. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicate any changes that you've made.