GitHub - Young-eng/ICME2024_data: ICME2024 data and files

Track #6, Animal Action Recognition

Dataset

the dataset comes from Animal Kingdom(https://sutdcv.github.io/Animal-Kingdom/Animal_Kingdom/action_recognition/README_action_recognition.html)
While action classification labels in this dataset are multi-label, here I only take one label for each clip to make it adapted for the action classification algorithm, like Kinetics-400 dataset format.I will provide the train.csv and val.csv which I used for training and validation. I only used 10,000 video clips, which is approximatelyone-third of clips in dataset, 8000 clips from training set and 2000 clips from test set set respectively. I did not used any images for training and validation. I will provide the python script which is used for generating these two csv files.

Algorithm

Here I take VideoMAE, proposed by team from Nanjing University, Tencent AI Lab and Shanghai AI Lab, which takes Kinetcis-400 as raining dataset. here are their codes(https://github.com/MCG-NJU/VideoMAE), you can find their paper(Tong, Zhan et al. “VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training.” ArXiv abs/2203.12602 (2022): n. pag.).You can follow this official guide in code repository to install packages and run codes.
I changed some parameters during training. like learning rate is 5e-4 and batch size is 4 respectively for 100 epochs. The GPU I used is a RTX 3090. The pre-trained model is vit-base-224, you can find pretrained model and script in their code Repository
When I knew this MMVRAC Challenge, there are only five days left. So I did not have enough time to make some imporvements on the algorithm model, what I did is just processing the data and fine-tuning the model.
Here metrics are top1 and top5 accuracy. While I used one label for each clip, original clips are multi-label, so top5 accuracy may be more credible. The top1 Accuracy is more than 45%, and top5 max accuracy is more than 80%
There are many things to do for future improvements. For example, modify the model structure to adapt for Animal Kingdom dataset or other downstream works if needed. And the metrics in this algorithm is little different with metrics in Animal Kingdom dataset.
I will provide all scripts, files and results. My work for this challenge is not perfect. I think it is a good start for the future, because I did not use all images and video data in this dataset.
Fine-tuned model link is here, checkpoint_best
After finishing final epoch of fine-tuning, it seems there is a bug when mering final results while there is no any other .txt files in output directory.

Citation of authors' Paper:

@inproceedings{tong2022videomae, title={Video{MAE}: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training}, author={Zhan Tong and Yibing Song and Jue Wang and Limin Wang}, booktitle={Advances in Neural Information Processing Systems}, year={2022} }

@article{videomae, title={VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training}, author={Tong, Zhan and Song, Yibing and Wang, Jue and Wang, Limin}, journal={arXiv preprint arXiv:2203.12602}, year={2022} }

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
results		results
LICENSE		LICENSE
getCSV.py		getCSV.py
readme.md		readme.md
test_v5.csv		test_v5.csv
train_mmaction.sh		train_mmaction.sh
train_v5.csv		train_v5.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Track #6, Animal Action Recognition

Dataset

Algorithm

Citation of authors' Paper:

About

Releases

Packages

Languages

License

Young-eng/ICME2024_data

Folders and files

Latest commit

History

Repository files navigation

Track #6, Animal Action Recognition

Dataset

Algorithm

Citation of authors' Paper:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages