∗The first three authors contribute equally to this work
@inproceedings{KM-BART,
title = "{KM}-{BART}: Knowledge Enhanced Multimodal {BART} for Visual Commonsense Generation",
author = "Xing, Yiran and
Shi, Zai and
Meng, Zhao and
Lakemeyer, Gerhard and
Ma, Yunpu and
Wattenhofer, Roger",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
year = "2021",
publisher = "Association for Computational Linguistics",
pages = "525--535"
}
-
Clone the repository recursively
git clone --recursive https://github.com/FomalhautB/KM-BART-ACL.git
-
Create conda environment
conda env create -f environment.yaml
The following steps are only required for feature extraction.
-
Install
bottom-up-attention.pytorch
. Please refer to bottom-up-attention.pytorch, for more details.cd bottom-up-attention.pytorch # install detectron2 cd detectron2 pip install -e . cd .. # install the rest modules python setup.py build develop cd ..
-
Install
comet-commonsense
. Please refer to comet-commonsense for more details.cd comet-commonsense # download data bash scripts/setup/get_atomic_data.sh bash scripts/setup/get_model_files.sh # install dependencies pip install tensorflow pip install ftfy==5.1 conda install -c conda-forge spacy python -m spacy download en pip install tensorboardX pip install tqdm pip install pandas pip install ipython
- Download the images from here and decompress the images into
$VCR_DATASET
- Download the annotations from here and decompress the annotations into
$VCG_ANNOTATION
- Extract features and save the features in
$VCG_DATA
:python -m scripts.prepare_vcg \ --data_dir $VCR_DATASET \ --output_dir $VCG_DATA \ --annot_dir $VCG_ANNOTATION \ --gpu_num 4
- Download the train images from here and decompress the images into
$COCO_TRAIN
- Download the validation images from here and decompress the images into
$COCO_VAL
- Download the annotations from here and decompress the annotations into
$COCO_ANNOTATION
- Extract features and save the features in
$COCO_DATA
:python -m scripts.prepare_coco \ --train_dir $COCO_TRAIN \ --val_dir $COCO_VAL \ --annot_dir $COCO_ANNOTATION \ --output_dir $COCO_DATA \ --gpu_num 4
- Download the json files for image urls and captions from here and Decompress the two files into
$SBU_ANNOTATION
- extract the features, bounding box and labels, build image annotations and save into
$OUTPUT_DATA
(This will download the images first and save in$SBU_DATA
):python -m scripts.prepare_sbu \ --download \ --data_dir $SBU_DATA \ --output_dir $OUTPUT_DATA \ --annot_dir $SBU_ANNOTATION \ --gpu_num 4 \ --n_jobs 8
- Download the objects, relationships, region descriptions, attributs and image meta data from here and decompress them into
$VG_ANNOTATION
- Download the images from the same link above and decompress them into
$VG_IMAGES
python -m scripts.prepare_vg \ --annot_dir $VG_ANNOTATION \ --output_dir $VG_DATA \ --data_dir $VG_IMAGES \ --gpu_num 4 \
- Download the pretrained weight
atomic_pretrained_model.pickle
of COMET from comet-commonsense- Save it to
$LOAD_PATH
. - Follow the instructions in comet-commonsense to make the dataloader of COMET.
- Save it to
- Download the json files for image urls and captions from here and decompress the two files into
$SBU_ANNOTATION
. - Download the SBU dataset and save the images in
$SBU_DATA
and decompress the features, bounding box and labels of images and save into$SBU_DATA
. - Generate inferences and save the inferences in
$REASON_DATA
.python -m scripts.prepare_sbu_reason \ --output_dir $REASON_DATA \ --annot_dir $SBU_ANNOTATION \ --model_file $LOAD_PATH/COMET \ --gpu_num 2 \ --sampling_algorithm topk-3 # rename the output file mv $REASON_DATA/train.json $SBU_DATA/reason_train.json
- Filter the newly generated inferences with a KM-BART pretrained on VCG (also in
$LOAD_PATH
) and save the final results in$OUTPUT_DATA
.python -m scripts.filter_reason \ --data_dir $SBU_DATA \ --output_dir $OUTPUT_DATA \ --checkpoint $LOAD_PATH/KM-BART
- Example of pretraining on COCO + SBU with 1 GPU and 4 CPUs from scratch (no pretrained weights)
python pretrain \ --dataset coco_train $COCO_DATA \ --dataset coco_val $COCO_DATA \ --dataset sbu_train $SBU_DATA \ --checkpoint_dir $CHECKPOINT_DIR \ --gpu_num 1 \ --batch_size 32 \ --master_port 12345 \ --log_dir $LOG_DIR \ --amp \ --num_workers 4 \ --model_config config/pretrain_base.json
- Example of loading pretrained weights from facebook bart base and train on COCO
python pretrain \ --dataset coco_train $COCO_DATA \ --checkpoint_dir $CHECKPOINT_DIR \ --model_config config/pretrain_base.json \ --checkpoint facebook/bart-base
- Example of loading pretrained weights from previous checkpoint and continue to train on COCO
python pretrain \ --dataset coco_train $COCO_DATA \ --checkpoint_dir $CHECKPOINT_DIR \ --model_config config/pretrain_base.json \ --checkpoint $CHECKPOINT \ --continue_training
- Example of loading weights from pretrained checkpoint and fine tune on VCG. Validation will of loss and score will be done at the end of each epoch
python vcg_train \ --data_dir $VCG_DATA \ --checkpoint_dir $CHECKPOINT_DIR \ --validate_loss \ --validate_score \ --model_config config/vcg_base.json \ --checkpoint $CHECKPOINT \
-
Example of generating sentences for VCG:
python vcg_generate \ --data_dir $VCG_DATA \ --checkpoint $CHECKPOINT \ --output_file $GENERATED_FILE \
-
Example of evaluating the generated file for VCG validation set:
python vcg_eval \ --generation $GENERATED_FILE \ --reference $VCG_DATA/val_ref.json