Extended Improved Deep Embedding Clustering

Extensiond of Improved Deep Embedding Clustering to support recurrent and convolutional autoencoders.

Motivation

The motivation behind this project is to try and extend the Deep Embedding for Clustering (DEC) and Improved Deep Embedding for Clustering (IDEC) algorithms for supporting a larger class of autoencoder (specifically recurrent and convolutional).

Given a dataset, both DEC and IDEC attempts to simultaneously learn a lower dimensional embedding of the data and an optimal partition of the same.

Features

A series of different autoencoders implementations are available

Vanilla Autoencoder (using Multilayer Perceptrons)
Recurrent Autoencoder
Convolutional Autoencoder

These can either be pre-trained and plugged into our implementation of IDEC or optimized during the IDEC training stage. Running IDEC with a 𝛿=0 and a pretrained autoencoder will be equivalent to run DEC.

Quick Benchmark

As a quick benachmark we compared the performcance of IDEC on three differente types of data (using the approapriated type of autoencoder) against an alternative method (Mini-Batch KMeans or DTW-based KMeans). The relsults of the clustering analsysi are reported (along with the ground truth) on a 2D reduction of the original dataset as produced by the UMAP algorithm. Given that we had access to the ground truth, we assessed the performance of each clustering algorithm through the Rand index adjusted for chance (ARScore).

Tabular Data - RNA-Seq Dataset

This collection of data is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions of patients having different types of tumor.

Time Series Data - NATOPS Aircraft Handling Signals Dataset

This collection of data report tracking of the gestures found in the Naval Air Training and Operating Procedures Standardization (NATOPS) which standardizes general flight and operating procedures for the US naval aircraft.

Image Data - Olivetti Dataset

This collection of data report ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
modules		modules
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmarking_DEC.ipynb		benchmarking_DEC.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extended Improved Deep Embedding Clustering

Motivation

Features

Quick Benchmark

Tabular Data - RNA-Seq Dataset

Time Series Data - NATOPS Aircraft Handling Signals Dataset

Image Data - Olivetti Dataset

References

DEC Implementations

IDEC Implementations

About

Languages

License

vb690/IDEC_extended

Folders and files

Latest commit

History

Repository files navigation

Extended Improved Deep Embedding Clustering

Motivation

Features

Quick Benchmark

Tabular Data - RNA-Seq Dataset

Time Series Data - NATOPS Aircraft Handling Signals Dataset

Image Data - Olivetti Dataset

References

DEC Implementations

IDEC Implementations

About

Resources

License

Stars

Watchers

Forks

Languages