Skip to content

Latest commit

 

History

History
51 lines (41 loc) · 2.75 KB

README.md

File metadata and controls

51 lines (41 loc) · 2.75 KB

Scene-boundary-detection in Videos

Implementation of the paper 'Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks' from scratch. Paper can be found at https://arxiv.org/abs/1705.08214

3D CNN is extensively used in this CNN model to achieve the stated performance. hence the model is fullt convolutional in time.

Prerequisites

  • Python
  • Keras (with tensorflow-gpu preffered)
  • Moviepy
  • Opencv,Numpy,Pandas

The samples videos should be snippets of the video scenes based on the scene boundary or shot-cut, preferably kept in 'out-clips/'.

Augmentation class and helper files

Augmentation class works using the moviepy(used for editing the videos) and it offer an effective library to augment the dataset of videos. it contains:

dataset_generator.py

Creates the dataset from the multiple augmentations listed in the 'augmentation_helper.py'.\n Creates a video and a csv file which includes the scene boundary frame numbers.

augmentation_helper.py

A helper file that has several functions to augment the dataset which includes many real to life scenerios including artificial flash mentioned in the paper.

sample_video_csv_gen.py

Sample use case of 'dataset_generator.py', creates the files aug_final.mp4 and csv_aug_data.csv.

Training the model

The model is an implementation of 10 frames/predcition model from the paper which gives one output for 10 frames. Video augmented data is not required as long as you can provide a csv(with 'frame_no,cut and transistion' colums) and video file.

model.py

The script for training the model, files aug_final.mp4 and csv_aug_data.csv has to be provided. The model uses 'adam' as optimizer and 'categorical crossentropy' for calculating loss.

Tensorboard and model checkpoints are used.

datagen.py & epoch_generator.py

Both the fles are to handle the image queue for the training purpose. epoch_generator.py ensures that the data fed into the model is equalized(equal no of postitive and negative dataset).

Testing

test_model.py

The script is to test the model performance using the generated model weights after training, ie the 'cut_video_final.h5'. Provides a image stream of 10 images and corresponding prediction.

check_vid.py

'check_vid.py' will provide a visualization for a .csv with scene cut frames and a video corresponding to it. As test_model.py it also provide a image stream of 10 images and corresponding prediction.

TO DO

  • Augmentation for artificial lighting, blurness, speed, color-channel(hue,BW and channel switch)
  • Augmentation for paning and zooming.
  • A generic model for scalable operation to reduce redundancy(any no of frames/ many prediction).