Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Audiovisual SlowFast Networks into the repo #219

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# LaTex
main.pdf
supp.pdf
**/*.aux
**/*.log
**/*.synctex.gz
**/*.aux
**/*.bbl
**/*.blg
**/*.brf
**/*.sublime-project
**/*.sublime-workspace
**/*.fdb_latexmk
**/*.fls
**/*.toc

tools/debug.sh

# MacOS stuff
.DS_Store
**/.DS_Store

**/__pycache__
**/*.pyc
**/.settings
.project
.pydevproject

# external/*

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
1 change: 1 addition & 0 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
- psutil: `pip install psutil`
- OpenCV: `pip install opencv-python`
- torchvision: `pip install torchvision` or `conda install torchvision -c pytorch`
- librosa: `pip install librosa` (if using Audiovisual SlowFast Networks)
- tensorboard: `pip install tensorboard`
- moviepy: (optional, for visualizing video on tensorboard) `conda install -c conda-forge moviepy` or `pip install moviepy`
- [Detectron2](https://github.com/facebookresearch/detectron2):
Expand Down
108 changes: 108 additions & 0 deletions configs/Kinetics/AVSLOWFAST_4x16_R50.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
TRAIN:
ENABLE: True
DATASET: kinetics
BATCH_SIZE: 64
EVAL_PERIOD: 10
CHECKPOINT_PERIOD: 1
AUTO_RESUME: True
# CHECKPOINT_FILE_PATH: ../../data/output/checkpoints/avslowfast.pth
# CHECKPOINT_TYPE: pytorch # caffe2 or pytorch
DATA:
USE_BGR_ORDER: False # False
NUM_FRAMES: 32
SAMPLING_RATE: 2
TRAIN_JITTER_SCALES: [256, 320]
TRAIN_CROP_SIZE: 224
TEST_CROP_SIZE: 256
INPUT_CHANNEL_NUM: [3, 3, 1]
USE_AUDIO: True
GET_MISALIGNED_AUDIO: True
AUDIO_SAMPLE_RATE: 16000
AUDIO_WIN_SZ: 32
AUDIO_STEP_SZ: 16
AUDIO_FRAME_NUM: 128
AUDIO_MEL_NUM: 80
AUDIO_MISALIGNED_GAP: 32 # half second
LOGMEL_MEAN: -7.03 # -7.03, -24.227
LOGMEL_STD: 4.66 # 4.66, 1.0
EASY_NEG_RATIO: 0.75
MIX_NEG_EPOCH: 96
SLOWFAST:
ALPHA: 8
BETA_INV: 8
FUSION_CONV_CHANNEL_RATIO: 2
FUSION_KERNEL_SZ: 5
AU_ALPHA: 32
AU_BETA_INV: 2
AU_FUSION_CONV_CHANNEL_MODE: ByDim # ByDim, ByRatio
AU_FUSION_CONV_CHANNEL_RATIO: 0.25
AU_FUSION_CONV_CHANNEL_DIM: 64
AU_FUSION_KERNEL_SZ: 5
AU_FUSION_CONV_NUM: 2
AU_REDUCE_TF_DIM: True
FS_FUSION: [False, False, True, True]
AFS_FUSION: [False, False, True, True]
AVS_FLAG: [False, False, True, True, True]
AVS_PROJ_DIM: 64
AVS_VAR_THRESH: 0.01
AVS_DUPLICATE_THRESH: 0.99999
DROPPATHWAY_RATE: 0.8 # 0.8
RESNET:
ZERO_INIT_FINAL_BN: True
WIDTH_PER_GROUP: 64
NUM_GROUPS: 1
DEPTH: 50
TRANS_FUNC: bottleneck_transform
AUDIO_TRANS_FUNC: tf_bottleneck_transform_v1
AUDIO_TRANS_NUM: 2
STRIDE_1X1: False
# 18: [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
# 34: [[3, 3, 3], [4, 4, 4], [6, 6, 6], [3, 3, 3]]
# 50: [[3, 3, 3], [4, 4, 4], [6, 6, 6], [3, 3, 3]]
# 101: [[3, 3, 3], [4, 4, 4], [23, 23, 23], [3, 3, 3]]
# 152: [[3, 3, 3], [8, 8, 8], [36, 36, 36], [3, 3, 3]]
NUM_BLOCK_TEMP_KERNEL: [[3, 3, 3], [4, 4, 4], [6, 6, 6], [3, 3, 3]]
SPATIAL_DILATIONS: [[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]]
NONLOCAL:
LOCATION: [[[], [], []], [[], [], []], [[], [], []], [[], [], []]]
GROUP: [[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]]
POOL: [
[[1, 2, 2], [1, 2, 2], [1, 2, 2]],
[[1, 2, 2], [1, 2, 2], [1, 2, 2]],
[[1, 2, 2], [1, 2, 2], [1, 2, 2]],
[[1, 2, 2], [1, 2, 2], [1, 2, 2]],
]
INSTANTIATION: dot_product
BN:
USE_PRECISE_STATS: True
NUM_BATCHES_PRECISE: 200
MOMENTUM: 0.1
WEIGHT_DECAY: 0.0
SOLVER:
BASE_LR: 0.1 # 0.1
LR_POLICY: cosine
MAX_EPOCH: 196
MOMENTUM: 0.9
WEIGHT_DECAY: 1e-4
WARMUP_EPOCHS: 34.0 # 34.0
WARMUP_START_LR: 0.01 # 0.01
OPTIMIZING_METHOD: sgd
MODEL:
NUM_CLASSES: 400
MODEL_NAME: AVSlowFast
ARCH: avslowfast
LOSS_FUNC: cross_entropy
DROPOUT_RATE: 0.5
TEST:
ENABLE: True
DATASET: kinetics
BATCH_SIZE: 64
# CHECKPOINT_FILE_PATH: ../../data/output/checkpoints/avslowfast.pth
# CHECKPOINT_TYPE: pytorch # caffe2 or pytorch
DATA_LOADER:
NUM_WORKERS: 8 # 8
PIN_MEMORY: True
NUM_GPUS: 8
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: ./output/AVSlowFast-R50-4x16
108 changes: 108 additions & 0 deletions configs/Kinetics/AVSLOWFAST_8x8_R50.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
TRAIN:
ENABLE: True
DATASET: kinetics
BATCH_SIZE: 32
EVAL_PERIOD: 10
CHECKPOINT_PERIOD: 1
AUTO_RESUME: True
# CHECKPOINT_FILE_PATH: ../../data/output/checkpoints/avslowfast.pth
# CHECKPOINT_TYPE: pytorch # caffe2 or pytorch
DATA:
USE_BGR_ORDER: False # False
NUM_FRAMES: 32
SAMPLING_RATE: 2
TRAIN_JITTER_SCALES: [256, 320]
TRAIN_CROP_SIZE: 224
TEST_CROP_SIZE: 256
INPUT_CHANNEL_NUM: [3, 3, 1]
USE_AUDIO: True
GET_MISALIGNED_AUDIO: True
AUDIO_SAMPLE_RATE: 16000
AUDIO_WIN_SZ: 32
AUDIO_STEP_SZ: 16
AUDIO_FRAME_NUM: 128
AUDIO_MEL_NUM: 80
AUDIO_MISALIGNED_GAP: 32 # half second
LOGMEL_MEAN: -7.03 # -7.03, -24.227
LOGMEL_STD: 4.66 # 4.66, 1.0
EASY_NEG_RATIO: 0.75
MIX_NEG_EPOCH: 96
SLOWFAST:
ALPHA: 4
BETA_INV: 8
FUSION_CONV_CHANNEL_RATIO: 2
FUSION_KERNEL_SZ: 7
AU_ALPHA: 16
AU_BETA_INV: 2
AU_FUSION_CONV_CHANNEL_MODE: ByDim # ByDim, ByRatio
AU_FUSION_CONV_CHANNEL_RATIO: 0.25
AU_FUSION_CONV_CHANNEL_DIM: 64
AU_FUSION_KERNEL_SZ: 5
AU_FUSION_CONV_NUM: 2
AU_REDUCE_TF_DIM: True
FS_FUSION: [False, False, True, True]
AFS_FUSION: [False, False, True, True]
AVS_FLAG: [False, False, True, True, True]
AVS_PROJ_DIM: 64
AVS_VAR_THRESH: 0.01
AVS_DUPLICATE_THRESH: 0.99999
DROPPATHWAY_RATE: 0.8 # 0.8
RESNET:
ZERO_INIT_FINAL_BN: True
WIDTH_PER_GROUP: 64
NUM_GROUPS: 1
DEPTH: 50
TRANS_FUNC: bottleneck_transform
AUDIO_TRANS_FUNC: tf_bottleneck_transform_v1
AUDIO_TRANS_NUM: 2
STRIDE_1X1: False
# 18: [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
# 34: [[3, 3, 3], [4, 4, 4], [6, 6, 6], [3, 3, 3]]
# 50: [[3, 3, 3], [4, 4, 4], [6, 6, 6], [3, 3, 3]]
# 101: [[3, 3, 3], [4, 4, 4], [23, 23, 23], [3, 3, 3]]
# 152: [[3, 3, 3], [8, 8, 8], [36, 36, 36], [3, 3, 3]]
NUM_BLOCK_TEMP_KERNEL: [[3, 3, 3], [4, 4, 4], [6, 6, 6], [3, 3, 3]]
SPATIAL_DILATIONS: [[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]]
NONLOCAL:
LOCATION: [[[], [], []], [[], [], []], [[], [], []], [[], [], []]]
GROUP: [[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]]
POOL: [
[[1, 2, 2], [1, 2, 2], [1, 2, 2]],
[[1, 2, 2], [1, 2, 2], [1, 2, 2]],
[[1, 2, 2], [1, 2, 2], [1, 2, 2]],
[[1, 2, 2], [1, 2, 2], [1, 2, 2]],
]
INSTANTIATION: dot_product
BN:
USE_PRECISE_STATS: True
NUM_BATCHES_PRECISE: 400
MOMENTUM: 0.1
WEIGHT_DECAY: 0.0
SOLVER:
BASE_LR: 0.1 # 0.1
LR_POLICY: cosine
MAX_EPOCH: 196
MOMENTUM: 0.9
WEIGHT_DECAY: 1e-4
WARMUP_EPOCHS: 34.0 # 34.0
WARMUP_START_LR: 0.01 # 0.01
OPTIMIZING_METHOD: sgd
MODEL:
NUM_CLASSES: 400
MODEL_NAME: AVSlowFast
ARCH: avslowfast
LOSS_FUNC: cross_entropy
DROPOUT_RATE: 0.5
TEST:
ENABLE: True
DATASET: kinetics
BATCH_SIZE: 32
# CHECKPOINT_FILE_PATH: ../../data/output/checkpoints/avslowfast.pth
# CHECKPOINT_TYPE: pytorch # caffe2 or pytorch
DATA_LOADER:
NUM_WORKERS: 8 # 8
PIN_MEMORY: True
NUM_GPUS: 8
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: ./output/AVSlowFast-R50-8x8
37 changes: 37 additions & 0 deletions projects/avslowfast/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Getting Started with PyAVSlowFast

This section supplements the original doc in PySlowFast (attached below) and provide instructions on how to start training AVSlowFast model with this codebase.

First, a note that `DATA.PATH_TO_DATA_DIR` points to the directory where annotation csv files reside and `DATA.PATH_PREFIX` to the root of the data directory.

Then, issue the following training command
```
python tools/run_net.py \
--cfg configs/Kinetics/AVSLOWFAST_4x16_R50.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_annotation \
DATA.PATH_PREFIX path_to_your_dataset_root \
NUM_GPUS 8 \
DATA_LOADER.NUM_WORKERS 8 \
TRAIN.BATCH_SIZE 64 \
```

For testing, run the following
```
python tools/run_net.py \
--cfg configs/Kinetics/AVSLOWFAST_4x16_R50.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_annotation \
DATA.PATH_PREFIX path_to_your_dataset_root \
TEST.BATCH_SIZE 32 \
TEST.CHECKPOINT_FILE_PATH path_to_your_checkpoint \
TRAIN.ENABLE False \
```

## Citing AVSlowFast
Please cite AVSlowFast if you use it in your research, you can use the following BibTeX entry.
```BibTeX
@article{xiao-avslowfast2020,
author = {Xiao, Fanyi and Lee, Yong Jae and Grauman, Kristen and Malik, Jitendra and Feichtenhofer, Christoph},
title = {{Audiovisual SlowFast Networks for Video Recognition}},
journal = {arXiv preprint arXiv:2001.08740},
Year = {2020}}
```
Loading