This is the script to process face (upper-body) video data for DELTA.
To process full body video, please check here.
For face video, DELTA needs input image, subject mask, hair mask, and inital SMPL-X estimation for training. Specificly, we use
- face alignment to detect face keypoints
- MediaPipe to detect iris
- MODNet to segment subject
- face-parsing to segment hair
And model data from DECA and PIXIE.
When using the processing script, it is necessary to agree to the terms of their licenses and properly cite them in your work.
- Clone submodule repositories:
git submodule update --init --recursive
- Download their needed data:
bash fetch_asset_data.sh
If the script failed, please check their websites and download the models manually.
Put your data list into ./lists/subject_list.txt, it can be video path or image folders.
Using subject "7ka4tohxYD8_8" for examples:
First process the video and generate labels
cd process_data
python 0_process_video.py --videopath ../dataset/7ka4tohxYD8_8/7ka4tohxYD8_8.mp4 --savepath ../dataset --crop --ignore_existing
Then fit each image by optimizing SMPLX parameters, the process time depends on the number of frames, better to run each image parallelly if you have cluster :)
cd ..
python process_data/1_smplx_fit_single.py --datapath dataset --subject 7ka4tohxYD8_8 --data_cfg dataset/7ka4tohxYD8_8/7ka4tohxYD8_8.yml --train_only
Then fit the video by optimizing all frames
python process_data/2_smplx_fit_all.py --datapath dataset --subject 7ka4tohxYD8_8 --data_cfg dataset/7ka4tohxYD8_8/7ka4tohxYD8_8.yml
You can check the fitting video results at dataset/7ka4tohxYD8_8/smplx_all/epoch_000499.mp4
The final results should look like: