ECG-Anomaly-Detection

Using an LSTM Autoencoder to detect abnormal heartbeats (anomalies) in real-world ECG data from a single patient with heart disease.

Expectation:

Prepare the dataset for Anomaly Detection from Time Series Data
Define the LSTM Autoencoder with PyTorch
Train and evaluate the LSTM model
Determine a threshold for anomaly detection
Classify unsees ECG signals as normal / abnormal [Test the model]

Data

Dataset: http://timeseriesclassification.com/description.php?Dataset=ECG5000

The dataset contains 5,000 Time Series examples (ECG's) with 140 timesteps. Each sequence corresponds to a single heartbeat from the same patient with congestive heart failure.

ECG - an electrocardiogram is a test that checks heart functionality by measuring electrical activity. An electrical impulse (wave) travels through your heart with each heartbeat causing the muscle to squeeze and pump blood to the body.

The dataset contains 5 types (classes) of heartbeat:

Normal (N)
R-on-T Premature Ventricular Contraction (R-on-T PVC)
Premature Ventricular Contraction (PVC)
Supra-ventricular Premature / Ectopic Beat (SP or EB)
Unclassified Beat (UB)

Healthy hearts typically beat at 65 - 75 bpm with a cardiac cycle (heartbeat) of around 0.8 seconds. Frequency: 60 - 100 Hz Duration: 0.6 - 1 second

Source

Exploratory Data Analysis

The 5000 examples are split across the 5 classes as follows:

Normal ........ 2919
R on T .......... 1767
SP or EB ...... 194
PVC ............. 96
UB ............... 24

The Autoencoder will be trained using the Normal ECG signals.

If we consider the averaged Time Series for each class below we notice the 'Normal' class has a distinctly different pattern which should prove extremely useful when trying to detect anomalies.

LSTM AUTOENCODER

The autoencoder should take the input data (individual heartbeat), pass it through the model, and obtain a reconstructed version of the input as closely matched as possible.

Using a low number of layers and parameters in the model will ensure a compressed representation of the data is used, providing a better chance of success. In other words, Autoencoders attempt to learn only the most important features (compressed) of the data. In this case, using a two layered LSTM we will capture the temporal dependenies of the data.

Reconstruction Loss & Anomaly Threshold

Since the objective is to reconstruct the input signal as best as possible, we will need to minimise the loss function - similar to supervised learning. This function is known as reconstruction loss.

Model trained using 85% of the 'Normal' heartbeat data and tested with the remaining unseen 15%:

| EPOCH: 1   | TRAIN LOSS: 51.47  | VAL LOSS: 50.55  | EXEC. TIME: 7.54 min  |
.
.
.
| EPOCH: 49  | TRAIN LOSS: 16.03  | VAL LOSS: 15.82  | EXEC. TIME: 7.52 min  |
| EPOCH: 50  | TRAIN LOSS: 15.61  | VAL LOSS: 15.49  | EXEC. TIME: 7.57 min  |

Once each sequence in the training dataset is classified as Normal, we can determine a threshold above which the heartbeat will be considered abnormal.

THHRESHOLD = 26

Evaluation

Since we have a threshold, the task resembles a simple binary classification task.

i.e. if the reconstruction loss in future is below the threshold, the heartbeat can be considered normal. Alternatively, any loss higher than the threshold is classified as an anomaly.

Normal Heartbeats

Using a completely seperate training dataset of 'Normal' heartbeats, we can determine how well our model works at determining normal heart beats:

Correct normal predictions: 209/219

Anomalies

Repeating the process again however with a test dataset of anomalies:

Correct anomaly predictions: 218/219

Fantastic results! The Autoencoder was able to determine ~99.5% of the abnormal heartbeats. We can tweak the threshold depending on the kind of errors to tolerate - In this case, having more false positives (normal heartbeats considered as anomalies) than false negatives (anomalies considered as normal) might be advantageous.

Visualizing the reconstructions

By overlaying the real and reconstructed Time Series values we can see how similar they are. We will select just the first 6 tensors from the Normal and Anomaly tests:

Summary

While the Time Series data is univariate (only 1 feature), the code should work for multivariate datasets (multiple features) with little or no modification.

References

https://github.com/shobrook/sequitur https://towardsdatascience.com/lstm-autoencoder-for-anomaly-detection-e1f4f2ee7ccf

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ECG5000		ECG5000
Images		Images
.DS_Store		.DS_Store
.gitattributes		.gitattributes
ECG-LSTM.py		ECG-LSTM.py
ECG_LSTM_Autoencoder.ipynb		ECG_LSTM_Autoencoder.ipynb
README.md		README.md
model.pth		model.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECG-Anomaly-Detection

Expectation:

Data

Exploratory Data Analysis

LSTM AUTOENCODER

Reconstruction Loss & Anomaly Threshold

Evaluation

Normal Heartbeats

Anomalies

Visualizing the reconstructions

Summary

References

About

Releases

Packages

Languages

darioSez/ECG-Anomaly-Detection

Folders and files

Latest commit

History

Repository files navigation

ECG-Anomaly-Detection

Expectation:

Data

Exploratory Data Analysis

LSTM AUTOENCODER

Reconstruction Loss & Anomaly Threshold

Evaluation

Normal Heartbeats

Anomalies

Visualizing the reconstructions

Summary

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages