We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
如果对v1.3.0的模型进行一些微调实验,理论上,我可以以精确的手部动作的文本描述来驱动手部吗,是不是对于现有的预训练文本编码器模型来说太难了?
The text was updated successfully, but these errors were encountered:
我认为主要问题还是在于模型参数不够大。当然文本编码器也有点影响。
Sorry, something went wrong.
主要问题还是在于模型参数不够大。当然文本编码器也有点影响。
感觉预训练的MT5或者CLIP模型在这种特定语料的任务上表现效果应该不好,除非把文本编码器重新训练一下。 如果想尝试一下用500个5-10秒左右的视频用两张A100微调的话,大概需要多久会有一个初步的效果呢?
No branches or pull requests
如果对v1.3.0的模型进行一些微调实验,理论上,我可以以精确的手部动作的文本描述来驱动手部吗,是不是对于现有的预训练文本编码器模型来说太难了?
The text was updated successfully, but these errors were encountered: