Skip to content

Releases: parichit/DCEM

DCEM 2.0.3 with vignettes

16 Apr 19:20
Compare
Choose a tag to compare

Added quick start examples and use cases in the vignettes. Find the source package (DCEM_2.0.3.tar.gz) below.

DCEM - patched for K++ initialization

06 Apr 03:14
Compare
Choose a tag to compare

This release fixed the bugs in the improved initialization routines. Also, added the option of getting the cluster membership of data.

DCEM: A new release with Blazing fast EM* Implementation.

28 Nov 21:43
Compare
Choose a tag to compare

Improves the EM* implementation for even faster execution. The EM* is motivated by the ideas published in the Using data to build a better EM: EM* for big data. Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic (2016) https://doi.org/10.1007/s41060-017-0062-1.

The package now supports both the EM* and the traditional EM algorithm for speed-up comparison. The EM* leverages the max-heap structure to expedite the execution time manifold compared to the conventional EM.

DCEM with EM* Implementation

23 Jul 17:53
Compare
Choose a tag to compare

The DCEM_1.0.0 release brings the faster version of the EM algorithm ([1] EM* algorithm). The EM* algorithm leverages the heap structure internally to avoid revisiting the data in the long run thereby reducing the run time of the conventional EM implementation, significantly. For easy accessibility, the function call for accessing the EM* stays the same as in the previous versions with the same parameters (the only exception being the function name). For technical details about the algorithm, please see the following:

Reference:

[1] Using data to build a better EM: EM* for big data. Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic doi:https://doi.org/10.1007/s41060-017-0062-1.

DCEM Improved and Faster Initialization

05 Apr 22:20
Compare
Choose a tag to compare

Implements the Expectation Maximisation (EM) algorithm for clustering finite Gaussian mixture models for
both multivariate and univariate datasets. The initialization is done by randomly selecting the samples from the dataset as the mean of the Gaussian(s). This version improves the parameter initialization on big datasets by using the ideas published in [1] K-means++: The Advantages of Careful Seeding, David Arthur and Sergei Vassilvitskii. URL http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf. The algorithm returns a set of Gaussian parameters-posterior probabilities, mean, covariance matrices (multivariate data)/standard-deviation (for univariate datasets) and priors.

Reference:
[1] K-means++: The Advantages of Careful Seeding, David Arthur and Sergei Vassilvitskii. URL http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf

[2] Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic (2016) doi:10.1007/s41060-017-0062-1. This work is partially supported by NCI Grant 1R01CA213466-01.