how to prepare the train data? #1

Bagfish · 2024-05-02T15:15:06Z

Dear sir
Thank you for your implementation based on PyTorch! I want to train the model, but i cant understand how to prepare the train data. In the paper, the speech and face image are paired, but in the first readme, i only see vox1,vox2and HQvox, which dataset is used to generate the face vector?

Kacper-Pietkun · 2024-05-02T19:26:07Z

Hello @Bagfish,
When it comes to training VoiceEncoder, you need to prepare a directory with a dataset, as described here - datasets (under the S2fDataset entry). In short, you must create a separate directory for each person, and inside each directory there must be two additional directories - one for calculated spectrograms (audios directory) and one for calculated face embeddings (images directory). Such directory can be used as a training set. If you want to prepare a validation or a test sets, just follow the same steps.

Note: To calculate spectrograms from audio files, you can use scripts located here: audio_spectrograms.py (for preprocessing like in Speech2Face: Learning the Face Behind a Voice paper) and ast_audio_preprocess.py (If you want to use AST voice encoder). On the other hand, face embeddings, which must be located in the images directories, must be calculated using FaceEncoder model - here is the script image_face_embeddings.py

Bagfish · 2024-05-03T02:30:11Z

thank you for your relpy, i will follow your guide!!! @Kacper-Pietkun

Bagfish · 2024-05-04T08:37:22Z

@Kacper-Pietkun I’m very sorry to bother you again. Can I ask you for the vgg model which is converted to pytorch? The reasons why i cant convert it by myself is: 1. I can’t find the model download link from the(https://github.com/serengil/deepface) . 2. My computer has been unable to install TensorFlow effectively.

Kacper-Pietkun · 2024-05-06T20:02:48Z

Here you will find PyTorch weights for the VGGFace_serengil model: https://drive.google.com/drive/u/2/folders/1DCqvpZYkd0chupA3mQeCVS7p69WAjnER

Bagfish · 2024-05-07T11:27:53Z

I get it!!! Thank you,very much!!!

Bagfish · 2024-05-09T08:50:39Z

@Kacper-Pietkun When i train the speechencoder, the loss appear Nan. I really cant find whats the problem.

Kacper-Pietkun · 2024-05-09T15:50:58Z

Have you trained FaceDecoder model beforehand? (When training VoiceEncoder, FaceDecoder model's weights should be frozen).
What VoiceEncoder model are you training? I had similar problems with ve_conv model. Try training ast model instead.
One approach that should help is playing with values of the coefficients of the loss function - coe_1, coe_2, coe_3, as well as the learning_rate hyperparameter.

Bagfish · 2024-05-10T01:36:57Z

@Kacper-Pietkun thank you for your reply
1.I have already trained FaceDecoder model,but I dont freeze the model weight. How can i freeze the facedecoder weight?
2.I will try the ast model instead.
3.I will try other hyperparameter.
Thank you very much!!!!

Bagfish · 2024-05-10T07:39:39Z

During the first part, whole model is frozen except the head, which is trained. During the second part whole model is unfrozen and the model is fine-tuned.
What's this means? In first step , which args in train/train_ast.py should I set? And how to fintune using train/train_ast.py,just use "python train/train_ast.py --fine-tune" is ok?

Kacper-Pietkun · 2024-05-10T18:26:51Z

@Kacper-Pietkun thank you for your reply 1.I have already trained FaceDecoder model,but I dont freeze the model weight. How can i freeze the facedecoder weight? 2.I will try the ast model instead. 3.I will try other hyperparameter. Thank you very much!!!!

Actually, I was wondering If you have trained FaceDecoder model beforehand, because it is necessary to calculate the loss function. You don't need to do anything extra to "freeze" FaceDecoder weight's, because optimizer was created only to optimize VoiceEncoder model's weights.

During the first part, whole model is frozen except the head, which is trained. During the second part whole model is unfrozen and the model is fine-tuned. What's this means? In first step , which args in train/train_ast.py should I set? And how to fintune using train/train_ast.py,just use "python train/train_ast.py --fine-tune" is ok?

Okay, so basically it looks like this. In the training script, AST VoiceEncoder model is downloaded from HuggingFace's transformer library, along with the pretrained weights. However, to adjust thee model to the problem of generating voice embedding vectors, it needs a new "head", so that the last layer's output dimension is equal to 4096 (just like face embedding vector size).

Here are the lines of code from the training script, which are responible for downloading model and swapping its "head".

Speech-to-face/src/train/train_ast.py

Lines 242 to 248 in e0e32af

    
           ast = AutoModelForAudioClassification.from_pretrained("MIT/ast-finetuned-audioset-10-10-0.4593", num_labels=4096, ignore_mismatched_sizes=True).to(device) 
        
           head = ast.classifier 
        
           new_head = nn.Sequential( 
        
               head, 
        
               nn.ReLU() 
        
           ) 
        
           ast.classifier = new_head

So, I splitted AST VoiceEncoder model training into two stages.

The first stage is responsilbe only for training this new "head" of the model (other layers are frozen). This is typically done during transfer learning, to avoid a situation where, due to a freshly initialized layer (head), the size of the gradient updates would be so large, that the other previously trained parameters would be altered too much, and the model would forget what it had learned. (Remember that the other parameters which are frozen during this step were initialized with pretrained weights).

Here are a few lines from training script, which are responsible for freezeing all model's parameters except the head. Additionally you can see, that the head's parameters are initialized with truncated normal distribution:

Speech-to-face/src/train/train_ast.py

Lines 249 to 254 in e0e32af

    
           # freeze every layer but - classifier.dense.bias and classifier.dense.weight 
        
           for name, param in ast.named_parameters(): 
        
               if name != "classifier.0.dense.weight" and name != "classifier.0.dense.bias": 
        
                   param.requires_grad = False 
        
               else: 
        
                   nn.init.trunc_normal_(param)

During the second stage, more of the model's parameters should be unfrozen, so that they can be optimized for the problem of generating voice embedding. You can unfreeze the whole model, or only some parts of it. Recently I have added to the training script --unfreeze-number parameter with which you can controll how many layers are unfrozen. (Actually this parameter specifies from which layer model should be unfrozen).

Generally, to run the second stage, beyond all of the other necessary parameters like --train-dataset-path, --face-decoder-weights-path and so on, you need to pass these parameters to the script:

--fine-tune - it is used as a flag to mark that model's head was already trained
--continue-training-path - it is used to specify path to the weight's of the ast model (the one which head was already trained)
--unfreeze-number - this one is optional, because by default when fine tuning, the whole model will be unfrozen. But as I said, you can use it as a hyperparameter. During my research I achieved the best results when I unfroze the model from the 165th layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to prepare the train data? #1

how to prepare the train data? #1

Bagfish commented May 2, 2024

Kacper-Pietkun commented May 2, 2024

Bagfish commented May 3, 2024

Bagfish commented May 4, 2024

Kacper-Pietkun commented May 6, 2024

Bagfish commented May 7, 2024

Bagfish commented May 9, 2024 •

edited

Loading

Kacper-Pietkun commented May 9, 2024

Bagfish commented May 10, 2024

Bagfish commented May 10, 2024 •

edited

Loading

Kacper-Pietkun commented May 10, 2024

how to prepare the train data? #1

how to prepare the train data? #1

Comments

Bagfish commented May 2, 2024

Kacper-Pietkun commented May 2, 2024

Bagfish commented May 3, 2024

Bagfish commented May 4, 2024

Kacper-Pietkun commented May 6, 2024

Bagfish commented May 7, 2024

Bagfish commented May 9, 2024 • edited Loading

Kacper-Pietkun commented May 9, 2024

Bagfish commented May 10, 2024

Bagfish commented May 10, 2024 • edited Loading

Kacper-Pietkun commented May 10, 2024

Bagfish commented May 9, 2024 •

edited

Loading

Bagfish commented May 10, 2024 •

edited

Loading