Merge branch 'develop'

AlessioZanga · Jul 12, 2020 · a88f62a · a88f62a
2 parents d9bb7a5 + cec0590
commit a88f62a
Show file tree

Hide file tree

Showing 25 changed files with 926 additions and 62 deletions.
diff --git a/Makefile b/Makefile
@@ -10,7 +10,7 @@ tuh_eeg_artifact:
 
 tuh_eeg_seizure:
 	echo "Request your access password at: https://www.isip.piconepress.com/projects/tuh_eeg/html/request_access.php"
-	rsync -auxvL [email protected]:~/data/tuh_eeg_seizure/v1.5.1 data/tuh_eeg_seizure
+	rsync -auxvL [email protected]:~/data/tuh_eeg_seizure/v1.5.2 data/tuh_eeg_seizure
 
 eegmmidb:
 	wget -r -N -c -np https://physionet.org/files/eegmmidb/1.0.0/ -P data

diff --git a/README.md b/README.md
@@ -47,15 +47,53 @@ If you need a bleeding edge version, you can install it directly from GitHub:
 
 The following datasets will work upon downloading:
 
-* [Temple University Abnormal EEG Dataset](https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml)
-* [Temple University Artifact EEG Dataset](https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml)
-* [EEG Motor Movement/Imagery Dataset](https://physionet.org/content/eegmmidb/1.0.0/)
+| Dataset | Size&nbsp;(GB) | Class&nbsp;Distribution | Task | Notes |
+|---------|---------------:|:------------------------|------|-------|
+| [TUH Abnormal EEG Dataset](https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml) | 59.0 GB | 'normal': 1521 <br /> 'abnormal': 1472 | Generic abnormal EEG events vs. normal EEG traces. | This dataset does not contain any annotation, the event extraction is performed according to other papers that used this dataset: for each record a 60s sample is extracted and labelled according to the class of the file. |
+| [TUH Artifact EEG Dataset](https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml) | 5.5 GB  | 'null': 1940 <br /> 'eyem': 606 <br /> 'musc': 254 <br /> 'elpp': 178 <br /> 'chew': 161 <br /> 'shiv': 60 | Multiple artifacts vs. EEG baseline. | At the moment, only the '01_tcp_ar' EEG reference setup can be used (more than ~95% of total records). |
+| [TUH Seizure EEG Dataset](https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml) | 54.0 GB | 'fnsz': 4240 <br /> 'gnsz': 1717 <br /> 'cpsz': 1496 <br /> 'tnsz': 334 <br /> 'tcsz': 191 <br /> 'mysz': 6 <br /> 'absz': 2 | Generic unclassified seizure type vs. specific seizure types. | At the moment, only the '01_tcp_ar' EEG reference setup can be used (more than ~95% of total records). <br /> Also, 'bckg' and 'scpz' classes are ignored: the former is just (a lot of) background noise, the latter has just one instance, which cannot be used with stratified cross-validation. |
+| [Motor Movement/Imagery EEG Dataset](https://physionet.org/content/eegmmidb/1.0.0/) | 3.4 GB | | Motor movement / imagery events. | The size of this dataset will increase a lot during preprocessing: although its download size is fairly small, the records of this dataset are entirely annotated, meaning that the whole dataset is suitable for feature extraction, not just sparse events like the others datasets. |
+| [CHB-MIT Scalp EEG Dataset](https://physionet.org/content/chbmit/1.0.0/) | 43.0 GB | 'noseizure': 545 <br /> 'seizure': 184 | No seizure events vs. seizure events. | While for 'seizure' events there are (begin, end, label) records, the 'noseizure' class is computed by extracting a 60s sample from records that are flagged as 'noseizure'. |
+
+## How to Class Meaning - From the TUH Seizure docs
+
+| **Class&nbsp;Code** | **Event&nbsp;Name**                               | **Description**                                                                                                    |
+| -------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
+| _NULL_         | No Event                                     | An unclassified event                                                                                              |
+| _SPSW_         | Spike/Sharp and Wave                         | Spike and wave/complexes , sharp and wave/complexes                                                                |
+| _GPED_         | Generalized Periodic Epileptiform Discharges | Diffused periodic discharges                                                                                       |
+| _PLED_         | Periodic Lateralized Epileptiform Discharges | Focal periodic discharges                                                                                          |
+| _EYBL_         | Eye blink                                    | A specific type of sharp, high amplitude eye movement artifact corresponding to blinks                             |
+| _ARTF_         | Artifacts (All)                              | Any non-brain activity electrical signal, such as those due to equipment or environmental factors                  |
+| _BCKG_         | Background                                   | Baseline/non-interesting events                                                                                    |
+| _SEIZ_         | Seizure                                      | Common seizure class which can include all types of seizure                                                        |
+| _FNSZ_         | Focal Non-Specific Seizure                   | Focal seizures which cannot be specified with its type                                                             |
+| _GNSZ_         | Generalized Non-Specific Seizure             | Generalized seizures which cannot be further classified into one of the groups below                               |
+| _SPSZ_         | Simple Partial Seizure                       | Partial seizures during consciousness; Type specified by clinical signs only                                       |
+| _CPSZ_         | Complex Partial Seizure                      | Partial Seizures during unconsciousness; Type specified by clinical signs only                                     |
+| _ABSZ_         | Absence Seizure                              | Absence Discharges observed on EEG; patient loses consciousness for few seconds (Petit Mal)                        |
+| _TNSZ_         | Tonic Seizure                                | Stiffening of body during seizure (EEG effects disappears)                                                         |
+| _CNSZ_         | Clonic Seizure                               | Jerking/shivering of body during seizure                                                                           |
+| _TCSZ_         | Tonic Clonic Seizure                         | At first stiffening and then jerking of body (Grand Mal)                                                           |
+| _ATSZ_         | Atonic Seizure                               | Sudden loss of muscle tone                                                                                         |
+| _MYSZ_         | Myoclonic Seizure                            | Myoclonous jerks of limbs                                                                                          |
+| _NESZ_         | Non-Epileptic Seizure                        | Any non-epileptic seizure observed. Contains no electrographic signs.                                              |
+| _INTR_         | Interesting Patterns                         | Any unusual or interesting patterns observed that don't fit into the above classes.                                |
+| _SLOW_         | Slowing                                      | A brief decrease in frequency                                                                                      |
+| _EYEM_         | Eye Movement Artifact                        | A very common frontal/prefrontal artifact seen when the eyes move                                                  |
+| _CHEW_         | Chewing Artifact                             | A specific artifact involving multiple channels that corresponds with patient chewing, “bursty”                    |
+| _SHIV_         | Shivering Artifact                           | A specific, sustained sharp artifact that corresponds with patient shivering.                                      |
+| _MUSC_         | Muscle Artifact                              | A very common, high frequency, sharp artifact that corresponds with agitation/nervousness in a patient.            |
+| _ELPP_         | Electrode Pop Artifact                       | A short artifact characterized by channels using the same electrode “spiking” with perfect symmetry.               |
+| _ELST_         | Electrostatic Artifact                       | Artifact caused by movement or interference on the electrodes, variety of morphologies.                            |
+| _CALB_         | Calibration Artifact                         | Artifact caused by calibration of the electrodes. Appears as a flattening of the signal in the beginning of files. |
+| _HPHS_         | Hypnagogic Hypersynchrony                    | A brief period of high amplitude slow waves.                                                                       |
+| _TRIP_         | Triphasic Wave                               | Large, three-phase waves frequently caused by an underlying metabolic condition.                                   |
+| _ELEC_         | Electrode Artifact                           | Electrode pop, Electrostatic artifacts, Lead artifacts.                                                            |
 
 ## How to Get a Dataset
 
-> **WARNING (1)**: Retriving the TUH EEG Abnormal dataset require at least 65GB of free disk space.
-
-> **WARNING (2)**: Retriving the TUH EEG Abnormal dataset require valid credentials, you can get your own at https://www.isip.piconepress.com/projects/tuh_eeg/html/request_access.php.
+> **WARNING**: Retriving the TUH EEG datasets require valid credentials, you can get your own at: https://www.isip.piconepress.com/projects/tuh_eeg/html/request_access.php.
 
 In the root directory of this project there is a Makefile, by typing:
 

diff --git a/examples/tensorboard/.gitignore b/examples/tensorboard/.gitignore
@@ -0,0 +1 @@
+logs*
diff --git a/examples/tensorboard/example_tensorboard.py b/examples/tensorboard/example_tensorboard.py
@@ -0,0 +1,230 @@
+#!/usr/bin/env python
+
+# Ignore MNE and TensorFlow warnings
+import warnings
+warnings.simplefilter(action='ignore')
+
+# Import TensorFlow with GPU memory settings
+import tensorflow as tf
+gpus = tf.config.experimental.list_physical_devices('GPU')
+try:
+    for gpu in gpus:
+        tf.config.experimental.set_memory_growth(gpu, True)
+except RuntimeError as e:
+    print(e)
+
+# Import TensorBoard params and metrics
+from tensorboard.plugins.hparams import api as hp
+from tensorflow.keras.metrics import CategoricalAccuracy, Precision, Recall
+
+# Import Spektral for GraphAttention
+import spektral as sp
+
+# Others imports
+import os
+import pickle
+import numpy as np
+from random import shuffle
+from itertools import product
+from networkx import to_numpy_matrix
+from sklearn.model_selection import train_test_split
+from tensorflow.python.keras.utils.np_utils import to_categorical
+
+# Relative import PyEEGLab
+import sys
+from os.path import abspath, dirname, join
+
+sys.path.insert(0, abspath(join(dirname(__file__), '../..')))
+from pyeeglab import *
+
+def build_data(dataset):
+    dataset.set_cache_manager(PickleCache('../../export'))
+
+    preprocessing = Pipeline([
+        CommonChannelSet(),
+        LowestFrequency(),
+        ToDataframe(),
+        MinMaxCentralizedNormalization(),
+        DynamicWindow(8),
+        ForkedPreprocessor(
+            inputs=[
+                SpearmanCorrelation(),
+                Mean(),
+                Variance(),
+                Skewness(),
+                Kurtosis(),
+                ZeroCrossing(),
+                AbsoluteArea(),
+                PeakToPeak(),
+                Bandpower(['Delta', 'Theta', 'Alpha', 'Beta'])
+            ],
+            output=ToMergedDataframes()
+        ),
+        ToNumpy()
+    ])
+
+    return dataset.set_pipeline(preprocessing).load()
+
+def adapt_data(data, test_size=0.1, shuffle=True):
+    if isinstance(data, str):
+        with open(data, 'rb') as f:
+            data = pickle.load(f)
+    samples, labels = data['data'], data['labels']
+    x_train, x_test, y_train, y_test = train_test_split(samples, labels, test_size=test_size, shuffle=shuffle, stratify=labels)
+    x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=test_size, shuffle=shuffle, stratify=y_train)
+    classes = np.sort(np.unique(labels))
+    y_train = to_categorical(y_train, num_classes=len(classes))
+    y_test = to_categorical(y_test, num_classes=len(classes))
+    y_val = to_categorical(y_val, num_classes=len(classes))
+    return x_train, y_train, x_val, y_val, x_test, y_test
+
+def build_model(shape, classes, hparams):
+    print(hparams)
+    N = shape[2]
+    F = shape[3] - N
+    frames = shape[1]
+
+    def get_feature_matrix(x, frame, N, F):
+        x = tf.slice(x, [0, frame, 0, N], [-1, 1, N, F])
+        x = tf.squeeze(x, axis=[1])
+        return x
+
+    def get_correlation_matrix(x, frame, N, F):
+        x = tf.slice(x, [0, frame, 0, 0], [-1, 1, N, N])
+        x = tf.squeeze(x, axis=[1])
+        return x
+
+    input_0 = tf.keras.Input((frames, N, F + N))
+
+    gans = []
+    for frame in range(frames):
+        feature_matrix = tf.keras.layers.Lambda(
+            get_feature_matrix,
+            arguments={'frame': frame, 'N': N, 'F': F}
+        )(input_0)
+
+        correlation_matrix = tf.keras.layers.Lambda(
+            get_correlation_matrix,
+            arguments={'frame': frame, 'N': N, 'F': F}
+        )(input_0)
+
+        x = sp.layers.GraphAttention(hparams['output_shape'])([feature_matrix, correlation_matrix])
+        x = tf.keras.layers.Flatten()(x)
+        gans.append(x)
+
+    combine = tf.keras.layers.Concatenate()(gans)
+    reshape = tf.keras.layers.Reshape((frames, N * hparams['output_shape']))(combine)
+    lstm = tf.keras.layers.LSTM(hparams['hidden_units'])(reshape)
+    dropout = tf.keras.layers.Dropout(hparams['dropout'])(lstm)
+    out = tf.keras.layers.Dense(classes, activation='softmax')(dropout)
+
+    model = tf.keras.Model(inputs=[input_0], outputs=out)
+    model.compile(
+        optimizer=tf.keras.optimizers.Adam(learning_rate=hparams['learning_rate']),
+        loss='categorical_crossentropy',
+        metrics=[
+            'accuracy',
+            Recall(class_id=0, name='recall'),
+            Precision(class_id=0, name='precision'),
+        ]
+    )
+    model.summary()
+    return model
+
+def run_trial(path, step, model, hparams, x_train, y_train, x_val, y_val, x_test, y_test, epochs):
+    with tf.summary.create_file_writer(path).as_default():
+        hp.hparams(hparams)
+        model.fit(x_train, y_train, epochs=epochs, batch_size=32, shuffle=True, validation_data=(x_val, y_val))
+        loss, accuracy, recall, precision = model.evaluate(x_test, y_test)
+        tf.summary.scalar('accuracy', accuracy, step=step)
+        tf.summary.scalar('recall', recall, step=step)
+        tf.summary.scalar('precision', precision, step=step)
+
+def hparams_combinations(hparams):
+    hp.hparams_config(
+        hparams=list(hparams.values()),
+        metrics=[
+            hp.Metric('accuracy', display_name='Accuracy'),
+            hp.Metric('recall', display_name='Recall'),
+            hp.Metric('precision', display_name='Precision'),
+        ]
+    )
+    hparams_keys = list(hparams.keys())
+    hparams_values = list(product(*[
+        h.domain.values
+        for h in hparams.values()
+    ]))
+    hparams = [
+        dict(zip(hparams_keys, values))
+        for values in hparams_values
+    ]
+    shuffle(hparams)
+    return hparams
+
+def tune_model(dataset_name, data):
+    LOGS_DIR = join('./logs/generic', dataset_name)
+    os.makedirs(LOGS_DIR, exist_ok=True)
+    # Prepare the data
+    x_train, y_train, x_val, y_val, x_test, y_test = adapt_data(data)
+    # Set tuning session
+    counter = 0
+    # Parameters to be tuned
+    hparams = {
+        'learning_rate': [1e-4, 5e-4, 1e-3],
+        'hidden_units': [8, 16, 32, 64],
+        'output_shape': [8, 16, 32, 64],
+        'dropout': [0.00, 0.05, 0.10, 0.15, 0.20],
+    }
+    hparams = {
+        key: hp.HParam(key, hp.Discrete(value))
+        for key, value in hparams.items()
+    }
+    hparams = hparams_combinations(hparams)
+    for hparam in hparams:
+        # Build the model
+        model = build_model(data['data'].shape, len(data['labels_encoder']), hparam)
+        # Run session
+        run_name = f'run-{counter}'
+        print(f'--- Starting trial: {run_name}')
+        print(hparam)
+        run_trial(
+            join(LOGS_DIR, run_name),
+            counter,
+            model,
+            hparam,
+            x_train,
+            y_train,
+            x_val,
+            y_val,
+            x_test,
+            y_test,
+            epochs=50
+        )
+        counter += 1
+
+
+if __name__ == '__main__':
+    dataset = {}
+
+    dataset['tuh_eeg_abnormal'] = TUHEEGAbnormalDataset('../../data/tuh_eeg_abnormal/v2.0.0/edf')
+
+    dataset['tuh_eeg_artifact'] = TUHEEGArtifactDataset('../../data/tuh_eeg_artifact/v1.0.0/edf')
+    dataset['tuh_eeg_artifact'].set_minimum_event_duration(4)
+
+    dataset['tuh_eeg_seizure'] = TUHEEGSeizureDataset('../../data/tuh_eeg_seizure/v1.5.2/edf')
+    dataset['tuh_eeg_seizure'].set_minimum_event_duration(4)
+
+    # dataset['eegmmidb'] = EEGMMIDBDataset('../../data/physionet.org/files/eegmmidb/1.0.0')
+    # dataset['eegmmidb'].set_minimum_event_duration(4)
+
+    dataset['chbmit'] = CHBMITDataset('../../data/physionet.org/files/chbmit/1.0.0')
+    dataset['chbmit'].set_minimum_event_duration(4)
+
+    """
+        Note: You can just use paths as values in the dictionary
+        and comment-out the first line of the following for cycle ;)
+    """
+
+    for key, value in dataset.items():
+        value = build_data(value)
+        tune_model(key, value)