Video ChatCaptioner: Towards the Enriched Spatiotemporal Descriptions

Official repository of Video ChatCaptioner.

See our paper Video ChatCaptioner: Towards the Enriched Spatiotemporal Descriptions

System Architecture

Installation

Note that you need a GPU with 24G memory to run ChatCaptioner due to the size of BLIP-2.

To start, git clone this repository first.

To install and activate the environment, run the following command:

conda env create -f environment.yml
conda activate chatcap

Set the environment variable OPENAI_API_KEY to your OpenAI API Key.

export OPENAI_API_KEY=Your_OpenAI_Key

You can add it to .bashrc so you don't need to set it manually everytime.

As many scripts here are in jupyter notebook, don't forget to add the environment to jupyter's kernel list. To do so, run

python -m ipykernel install --user --name=chatcap

Download our dataset samples from here and extract the zip file to the root folder.

To play with Video ChatCaptioner with a few dataset samples on msvd videos

sh run_msvd.sh

To play with Video ChatCaptioner with a few dataset samples on webvid videos

sh run_webvid.sh

Acknowledgement

ChatGPT
BLIP2

Please cite Video ChatCaptioner from the following bibtex

@article{chen2023video,
      title={Video ChatCaptioner: Towards the Enriched Spatiotemporal Descriptions}, 
      author={Jun Chen and Deyao Zhu and Kilichbek Haydarov and Xiang Li and Mohamed Elhoseiny},
      journal={arXiv preprint arXiv:2304.04227},
      year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Video ChatCaptioner: Towards the Enriched Spatiotemporal Descriptions

System Architecture

Installation

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Video ChatCaptioner: Towards the Enriched Spatiotemporal Descriptions

System Architecture

Installation

Acknowledgement