LSTM: Long Short-Term Memory

Features:

sensitive to scale, The default activation function for LSTMs is the hyperbolic tangent (tanh), which outputs values between -1 and 1. This is the preferred range for the time series data.
A benefit of this type of network is that it can learn and remember over long sequences and does not rely on a pre-specified window lagged observation as input. in keras: stateful=True

Methods to deal with variable length of sequences for LSTM in keras:

pad all the sequences with 0 so they are same length (pick the max length you have)
- from keras.preprocessing import sequence
- sequence.pad_sequences, see here
- don't specify maxlen, it will use the maximum length sequence length.
  - train = sequence.pad_sequences(train, dtype='float')
  - note, defautly dtype is int32
add a masking layer to your input
use batch_size = 1, i.e., you feed only one sequence at a time, then each sequence can have any length.

References:

Provide feedback