Forecasting future video content using ConvLSTM
In fields like climate science, it is beneficial to be able to make predictions based on past data. In this case, we'd like to predict sea surface temperature using images. There are two ways this can be done:
- Using a physics based dynamic model, which incorporates expert-knowledge and involves many complex equations
- Using a Convolutional LSTM, which learns the inherent patterns in sequences of video frames and uses a deep network to make predictions (our approach)
- Used a ConvLSTM model with a Seq2Seq approach. Frames (images) were encoded to (16x16) using a DCGAN64Encoder for faster training
- Experimented with teacher forcing, resulting in two models
Two metrics were used to evaluate the model, MSE and SSIM (Structural Similarity). SSIM is a more human centered metric since there can be image distortions which are not clearly visible and therefore don't impact the overall frame similarity to the reference.
- Create a conda env with dependencies (conda create --name <env_name> --file requirements.txt)
- Download data from google drive folder
- To run training: python train.py --data_source sst --seq_len 4 --horizon 6
- To run testing: python test.py --model_name MODEL_NAME --data_source sst --seq_len 4 --horizon 6
Utility functions (Encoder/Decoder,ConvLSTM cell, etc.) provided by Dr.Yao-Yi Chiang.
Remaining code by Bharath Sivaram