This repository provides an implementation for the unferenced image captioning metric presented in our ACL 2021 paper UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning.
There are 3 steps for running the code:
- Download the pretrained checkpoint (about 220MB) of UMIC.
- Download the pre-computed visual features(img_db) for the dataset you want to compute the score.
- Run the preprocess code for your candidate captions to make textual features(txt_db).
Then you can easily compute the scores for your image-caption pairs using the compute_score.py
.
Create a Python 3.6 environment and then install the requirements from requirements.txt
:
conda create -name umic python=3.6
pip install -r requirements.txt
Download umic.tar.gz and extract it. (the default directory in the code is ./ckpt
)
Please refer to the offical repo of UNITER for computing the visual features for other datasets using the raw image.
We provide the processed version for four datasets we used in the paper in txt_db
dir.
To process new captions, please process the data as follows.
The format of textual feature file(python dictionary, json format) is a list of the dictionary like the below:
- 'caption' : [candidate catpion]
- 'imgid' : [image id for the caption in each dataset.]
Please refer to sample.json
as an example format.
Note that we regard each image file name as dataset_name_image_id.jpg following the coco dataset.
Using the '.json' format that has the list composted of these dictionaries, please preprocess the file using the following command.
python make_txt_db.py --input_file $INPUT_JSON_FILE \
--img_type $IMG_DATSET_NAME (e.g. 'coco_val2014' for capeval1k) \
--out_dir $PATH_TO_OUTPUT_DIR
python compute_score.py --img_db $IMG_DB_DIR \
--txt_db $TXT_DB_DIR \
--out_file $OUT_FILE_NAME(.json format) \
--ckpt $CKPT_DIR (default is ckpt/umic.pt)
If you find this repo useful, please consider citing our ACL 2021 paper:
@inproceedings{lee-etal-2021-umic,
title = "{UMIC}: An Unreferenced Metric for Image Captioning via Contrastive Learning",
author = "Lee, Hwanhee and
Yoon, Seunghyun and
Dernoncourt, Franck and
Bui, Trung and
Jung, Kyomin",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-short.29",
doi = "10.18653/v1/2021.acl-short.29",
pages = "220--226",
}