-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit c669515
Showing
471 changed files
with
52,772 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
## 学习 | ||
|
||
https://zhuanlan.zhihu.com/p/362193124 | ||
|
||
|
||
|
||
|
||
|
||
2021/5/3 15:16:47 | ||
各位同学在参赛过程中如遇到问题,可参考FAQ文档,文档中的问题包括数据•代金券•TI-ONE平台操作等等方面。 | ||
【腾讯文档】2021腾讯广告算法大赛FAQ | ||
https://docs.qq.com/doc/DV1hFUGpMV1l3eVdV | ||
如您有其他的问题,欢迎群内反馈,我们会不断更新丰富FAQ~祝比赛顺利🥰 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# 0.简介 | ||
多模态视频标签模型框架 | ||
|
||
# 1. 代码结构 | ||
- configs--------------------# 模型选择,参数文件 | ||
- src------------------------# 数据加载/模型相关代码 | ||
- scripts--------------------# 数据预处理/训练/测试脚本 | ||
- checkpoints----------------# 模型权重/日志 | ||
- pretrained-----------------# 预训练模型 | ||
- dataset--------------------# 数据集和标签字典 | ||
- utils----------------------# 工具脚本 | ||
- ReadMe.md | ||
|
||
# 2. 环境配置 | ||
sudo apt-get update | ||
sudo apt-get install libsndfile1-dev ffmpeg imagemagick nfs-kernel-server | ||
pip install -r requirement.txt | ||
|
||
## 2.1 配置 imagemagick | ||
删除或注释配置文件/etc/ImageMagick-6/policy.xml中的: | ||
`<policy domain="path" rights="none" pattern="@*" />` | ||
|
||
# 3. 训练流程 | ||
## 3.1 数据预处理 | ||
根据特定任务准备训练数据,视频/音频特征提取可参考scripts/preprocess.sh | ||
bash scripts/preprocess.sh tagging.txt 0 0 4 1 | ||
|
||
## 3.2 加载预训练模型参数 | ||
参考pretrained目录下说明 | ||
|
||
## 3.3 启动训练 | ||
python scripts/train_tagging.py --config configs/config.ad_content.yaml | ||
|
||
## 3.4 训练验证tensorboard曲线 | ||
tensorboard --logdir checkpoints --port 8080 | ||
|
||
# 4. 验证流程 | ||
python scripts/eval.py --config configs/config.ad_content.yaml --ckpt_step -1 | ||
|
||
1. 分别输出多模态融合特征, 视觉特征,音频特征,文本特征的评测指标: | ||
* Hit@1(模型预测得分最高标签的准确率) | ||
* PERR(按预测得分大小,取前k个预测输出tag对应的准确率,其中k=该样本gt中包含的标签个数) | ||
* MAP(mean average precision) | ||
* [GAP(Global Average Precision)](https://www.kaggle.com/c/youtube8m/overview/evaluation) | ||
2. 输出对每个标签的频次统计和每个标签的ap(average precision) **用于分析每个标签的准确度** | ||
3. 输出各个标签之间的相关性统计矩阵M, $M_{a,B}$即样本标签为a时, 模型预测为B(b1,b2,...)的分布频次统计, 保留前top_k个结果保**用于将相似标签合并,更新标签字典文件** | ||
4. 保存输出文件eval_tag_analysis.txt, `每行依次表示tag_freq,tag_ap,tag_conf,tag_precision,tag_recall`, 通过scripts/tag_analysis.ipynb对验证结果进行分析 | ||
|
||
# 5. 测试流程 | ||
python scripts/inference.py --model_pb checkpoints/ad_content_form/v1/export/step_7000_0.8217 \ | ||
--tag_id_file dataset/dict/tag-id-ad_content_b0.txt \ | ||
--test_dir dataset/looklike_interview \ | ||
--postfix mp4 \ | ||
--output ./pred_output.txt \ | ||
--top_k 5 | ||
> 参数说明 | ||
``` | ||
--model_pb 导出模型pb目录 | ||
--tag_id_file 标签字典文件 | ||
--test_dir 输入测试文件目录 | ||
--output 预测输出标签结果保存文件 | ||
--top_k 预测输出标签个数 | ||
--postfix 测试文件格式, mp4或者jpg文件 | ||
``` | ||
|
||
# 6. Badcase 分析 | ||
## 6.1 预测可视化 | ||
python scripts/write_prediction.py --inference_file checkpoints/ad_content_form/v1/inference_result_fusion.txt --sample_num 200 --save_dir temp --tag_id_file dataset/dict/tag-id-ad_content_b0.txt --test_dir dataset/videos/ad_content --gt_file dataset/info_files/ad_content_datafile_b0.txt --postfix mp4 | ||
|
||
> 参数说明 | ||
``` | ||
--inference_file 预测输出标签结果保存文件 | ||
--sample_num 随机采样可视化sample_num个视频 | ||
--gt_file 样本对应gt文件(可选项) | ||
--save_dir 可视化文件保存路径 | ||
--filter_tag_name 只可视化带有该标签的样本(可选项) | ||
--tag_id_file 标签字典文件(可选项,当filter_tag_name不为空时需要) | ||
--test_dir 测试文件目录 | ||
--postfix 测试文件格式, mp4或者jpg文件 | ||
``` |
154 changes: 154 additions & 0 deletions
154
...ructuring/MultiModal-Tagging/configs/.ipynb_checkpoints/config.tagging.5k-checkpoint.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
############################################################# | ||
# 1. Model Define Configs | ||
############################################################# | ||
ModelConfig: | ||
model_type: 'NextVladBERT' | ||
use_modal_drop: True #在训练过程中,对多模态特征的某一模态进行丢弃 | ||
with_embedding_bn: False #对不同模态输入特征进行BN归一化 | ||
modal_drop_rate: 0.3 | ||
with_video_head: True #视频特征 | ||
with_audio_head: True #音频特征 | ||
with_text_head: True #文本特征 | ||
with_image_head: True # False #图片特征 | ||
|
||
#视频特征(16384) | ||
video_head_type: 'NeXtVLAD' | ||
video_head_params: | ||
nextvlad_cluster_size: 128 | ||
groups: 16 | ||
expansion: 2 | ||
feature_size: 1024 #inception feature dim | ||
max_frames: 300 | ||
|
||
#语音特征(1024) | ||
audio_head_type: 'NeXtVLAD' | ||
audio_head_params: | ||
nextvlad_cluster_size: 64 | ||
groups: 16 | ||
expansion: 2 | ||
feature_size: 128 #vggfish feature dim | ||
max_frames: 300 | ||
|
||
#文本特征(1024) | ||
text_head_type: 'BERT' | ||
text_head_params: | ||
bert_config: | ||
attention_probs_dropout_prob: 0.1 | ||
hidden_act: "gelu" | ||
hidden_dropout_prob: 0.1 | ||
hidden_size: 768 | ||
initializer_range: 0.02 | ||
intermediate_size: 3072 | ||
max_position_embeddings: 512 | ||
num_attention_heads: 12 | ||
num_hidden_layers: 12 | ||
type_vocab_size: 2 | ||
vocab_size: 21128 | ||
bert_emb_encode_size: 1024 | ||
|
||
#图片特征(2048) | ||
image_head_type: 'resnet_v2_50' | ||
image_head_params: {} | ||
|
||
|
||
#多模态特征融合方式 | ||
fusion_head_type: 'SE' | ||
fusion_head_params: | ||
hidden1_size: 1024 | ||
gating_reduction: 8 # reduction factor in se context gating | ||
drop_rate: | ||
video: 0.8 | ||
audio: 0.5 | ||
image: 0.5 | ||
text: 0.5 | ||
fusion: 0.8 | ||
|
||
#tagging分类器参数 | ||
tagging_classifier_type: 'LogisticModel' | ||
tagging_classifier_params: | ||
num_classes: 82 #标签数目, 按需修改 | ||
|
||
############################################################# | ||
#2. Optimizer & Train Configs | ||
############################################################# | ||
OptimizerConfig: | ||
optimizer: 'AdamOptimizer' | ||
optimizer_init_params: {} | ||
clip_gradient_norm: 1.0 | ||
learning_rate_dict: | ||
video: 0.0001 | ||
audio: 0.0001 | ||
text: 0.00001 | ||
image: 0.0001 | ||
classifier: 0.01 | ||
loss_type_dict: | ||
tagging: "CrossEntropyLoss" | ||
max_step_num: 10000 | ||
export_model_steps: 1000 | ||
learning_rate_decay: 0.1 | ||
start_new_model: True # 如果为True,重新训练; 如果False,则resume | ||
num_gpu: 1 | ||
log_device_placement: False | ||
gpu_allow_growth: True | ||
pretrained_model: | ||
text_pretrained_model: 'pretrained/bert/chinese_L-12_H-768_A-12/bert_model.ckpt' | ||
image_pretrained_model: 'pretrained/resnet_v2_50/resnet_v2_50.ckpt' | ||
train_dir: './checkpoints/tagging5k_temp' #训练模型保存目录,按需修改 | ||
|
||
############################################################# | ||
# 3. DataSet Config | ||
############################################################# | ||
DatasetConfig: | ||
batch_size: 32 | ||
shuffle: True | ||
train_data_source_list: | ||
train799: | ||
file: '../dataset/tagging/GroundTruth/datafile/train.txt' # preprocessing脚本生成文件,按需求修改 (datafile) | ||
batch_size: 32 | ||
|
||
valid_data_source_list: | ||
val799: | ||
file: '../dataset/tagging/GroundTruth/datafile/val.txt' # preprocessing脚本生成文件,按需求修改 | ||
batch_size: 32 | ||
|
||
preprocess_root: 'src/dataloader/preprocess/' | ||
preprocess_config: | ||
feature: | ||
- name: 'video,video_frames_num,idx' | ||
shape: [[300,1024], [],[]] | ||
dtype: 'float32,int32,string' | ||
class: 'frames_npy_preprocess.Preprocess' | ||
extra_args: | ||
max_frames: 300 | ||
feat_dim: 1024 | ||
return_frames_num: True | ||
return_idx: True | ||
|
||
- name: 'audio,audio_frames_num' | ||
shape: [[300,128], []] | ||
dtype: 'float32,int32' | ||
class: 'frames_npy_preprocess.Preprocess' | ||
extra_args: | ||
max_frames: 300 | ||
feat_dim: 128 | ||
return_frames_num: True | ||
|
||
- name: 'image' | ||
shape: [[224,224,3]] | ||
dtype: 'float32' | ||
class: 'image_preprocess.Preprocess' | ||
|
||
- name: 'text' | ||
shape: [[128]] | ||
dtype: 'int64' | ||
class: 'text_preprocess.Preprocess' | ||
extra_args: | ||
vocab: 'pretrained/bert/chinese_L-12_H-768_A-12/vocab.txt' | ||
max_len: 128 | ||
label: | ||
- name: 'tagging' | ||
dtype: 'float32' | ||
shape: [[82]] # 根据 num_classes修改 | ||
class: 'label_preprocess.Preprocess_label_sparse_to_dense' | ||
extra_args: | ||
index_dict: '../dataset/label_id.txt' # 按需求更改 |
Oops, something went wrong.