Releases: parichit/DCEM
DCEM 2.0.3 with vignettes
DCEM - patched for K++ initialization
This release fixed the bugs in the improved initialization routines. Also, added the option of getting the cluster membership of data.
DCEM: A new release with Blazing fast EM* Implementation.
Improves the EM* implementation for even faster execution. The EM* is motivated by the ideas published in the Using data to build a better EM: EM* for big data. Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic (2016) https://doi.org/10.1007/s41060-017-0062-1.
The package now supports both the EM* and the traditional EM algorithm for speed-up comparison. The EM* leverages the max-heap structure to expedite the execution time manifold compared to the conventional EM.
DCEM with EM* Implementation
The DCEM_1.0.0 release brings the faster version of the EM algorithm ([1] EM* algorithm). The EM* algorithm leverages the heap structure internally to avoid revisiting the data in the long run thereby reducing the run time of the conventional EM implementation, significantly. For easy accessibility, the function call for accessing the EM* stays the same as in the previous versions with the same parameters (the only exception being the function name). For technical details about the algorithm, please see the following:
Reference:
[1] Using data to build a better EM: EM* for big data. Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic doi:https://doi.org/10.1007/s41060-017-0062-1.
DCEM Improved and Faster Initialization
Implements the Expectation Maximisation (EM) algorithm for clustering finite Gaussian mixture models for
both multivariate and univariate datasets. The initialization is done by randomly selecting the samples from the dataset as the mean of the Gaussian(s). This version improves the parameter initialization on big datasets by using the ideas published in [1] K-means++: The Advantages of Careful Seeding, David Arthur and Sergei Vassilvitskii. URL http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf. The algorithm returns a set of Gaussian parameters-posterior probabilities, mean, covariance matrices (multivariate data)/standard-deviation (for univariate datasets) and priors.
Reference:
[1] K-means++: The Advantages of Careful Seeding, David Arthur and Sergei Vassilvitskii. URL http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
[2] Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic (2016) doi:10.1007/s41060-017-0062-1. This work is partially supported by NCI Grant 1R01CA213466-01.