Skip to content

Latest commit

 

History

History
102 lines (79 loc) · 6.09 KB

README.md

File metadata and controls

102 lines (79 loc) · 6.09 KB

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting(CVPR 2022 Oral)

LICENSE 996.icu

Here is the official implementation for CVPR 2022 paper "TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting"

🌱News

Introduction

Counting repetitive actions are widely seen in human activities such as physical exercise. Existing methods focus on performing repetitive action counting in short videos, which is tough for dealing with longer videos in more realistic scenarios. In the data-driven era, the degradation of such generalization capability is mainly attributed to the lack of long video datasets. To complement this margin, we introduce a new large-scale repetitive action counting dataset covering a wide variety of video lengths, along with more realistic situations where action interruption or action inconsistencies occur in the video. Besides, we also provide a fine-grained annotation of the action cycles instead of just counting annotation along with a numerical value. Such a dataset contains 1451 videos with about 20000 annotations, which is more challenging. For repetitive action counting towards more realistic scenarios, we further propose encoding multi-scale temporal correlation with transformers that can take into account both performance and efficiency. Furthermore, with the help of fine-grained annotation of action cycles, we propose a density map regression-based method to predict the action period, which yields better performance with sufficient interpretability. Our proposed method outperforms state-of-the-art methods on all datasets and also achieves better performance on the un-seen dataset without fine-tuning.

RepCount Dataset

The Homepage of RepCount Dataset is available now.

Dataset introduction

We introduce a novel repetition action counting dataset called RepCount that contains videos with significant variations in length and allows for multiple kinds of anomaly cases. These video data collaborate with fine-grained annotations that indicate the beginning and end of each action period. Furthermore, the dataset consists of two subsets namely Part-A and Part-B. The videos in Part-A are fetched from YouTube, while the others in Part-B record simulated physical examinations by junior school students and teachers.

Video Presentation

Bilibili
YouTube

Usage

Install

Please refer to install.md for installation.

Data preparation

Firstly, you should loading the pretrained model Video Swin Transformer(github) in to the folder 'pretrained'.

Secondly, you should modify train.py to your config.

Tips: The data form can be .mp4 or .npz. We recommend to use .npz data because it is faster. We will upload the preprocessed data(.npz) soon. You can also refer to video2npz to transform them by yourself.

We will upload the TransRAC trained model soon which may help you to reproduce our paper.

Train

python train.py


If you have any questions, don't hesitate to contact us!

But please understand that the response may be delayed as we are working on other research.😖


Citation

If you find the project or the dataset is useful, please consider citing the paper.

@inproceedings{hu2022transrac,
  title={TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting},
  author={Hu, Huazhang and Dong, Sixun and Zhao, Yiqun and Lian, Dongze and Li, Zhengxin and Gao, Shenghua},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={19013--19022},
  year={2022}
}

or

@article{hu2022transrac,
  title={TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting},
  author={Hu, Huazhang and Dong, Sixun and Zhao, Yiqun and Lian, Dongze and Li, Zhengxin and Gao, Shenghua},
  journal={arXiv preprint arXiv:2204.01018},
  year={2022}
}