-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to prepare the train data? #1
Comments
Hello @Bagfish, Note: To calculate spectrograms from audio files, you can use scripts located here: audio_spectrograms.py (for preprocessing like in Speech2Face: Learning the Face Behind a Voice paper) and ast_audio_preprocess.py (If you want to use AST voice encoder). On the other hand, face embeddings, which must be located in the |
thank you for your relpy, i will follow your guide!!! @Kacper-Pietkun |
@Kacper-Pietkun I’m very sorry to bother you again. Can I ask you for the vgg model which is converted to pytorch? The reasons why i cant convert it by myself is: 1. I can’t find the model download link from the(https://github.com/serengil/deepface) . 2. My computer has been unable to install TensorFlow effectively. |
Here you will find PyTorch weights for the |
I get it!!! Thank you,very much!!! |
@Kacper-Pietkun When i train the speechencoder, the loss appear Nan. I really cant find whats the problem. |
|
@Kacper-Pietkun thank you for your reply |
During the first part, whole model is frozen except the head, which is trained. During the second part whole model is unfrozen and the model is fine-tuned. |
Actually, I was wondering If you have trained FaceDecoder model beforehand, because it is necessary to calculate the loss function. You don't need to do anything extra to "freeze" FaceDecoder weight's, because optimizer was created only to optimize VoiceEncoder model's weights.
Okay, so basically it looks like this. In the training script, AST VoiceEncoder model is downloaded from HuggingFace's transformer library, along with the pretrained weights. However, to adjust thee model to the problem of generating voice embedding vectors, it needs a new "head", so that the last layer's output dimension is equal to 4096 (just like face embedding vector size). Here are the lines of code from the training script, which are responible for downloading model and swapping its "head". Speech-to-face/src/train/train_ast.py Lines 242 to 248 in e0e32af
So, I splitted AST VoiceEncoder model training into two stages.
Here are a few lines from training script, which are responsible for freezeing all model's parameters except the head. Additionally you can see, that the head's parameters are initialized with truncated normal distribution: Speech-to-face/src/train/train_ast.py Lines 249 to 254 in e0e32af
Generally, to run the second stage, beyond all of the other necessary parameters like
|
Dear sir
Thank you for your implementation based on PyTorch! I want to train the model, but i cant understand how to prepare the train data. In the paper, the speech and face image are paired, but in the first readme, i only see vox1,vox2and HQvox, which dataset is used to generate the face vector?
The text was updated successfully, but these errors were encountered: