##scGMAI
scGMAI: a Gaussian mixture model for clustering single-cell RNA-seq data based on deep autoencoder.
###scGMAI uses the following dependencies:
- python = 3.6
- numpy = 1.16.3
- scipy = 1.4.1
- pandas = 0.25.3
- scikit-learn = 0.22.1
- tensorflow = 1.13.1
- matplotlib = 3.0.3
- R = 3.6
###Guiding principles:
**We only provide one single-cell RNA-seq dataset, other datasets can be obtained from corresponding website.
**autoencoder_model: Autoencoder.py and AutoencoderRunner.py are the implementation of Autoencoder networks.
**Clustering:
- CellTree: CellTree.m and CellTree.R are the implementation of CellTree.
- GaussianMixture clustering: GaussianMixture clustering.py is the implementation of GaussianMixture clustering.
- Kmeans: Kmeans.py is the implementation of Kmeans.
- Seurat: Seurat.R is the implementation of Seurat.
- SIMLR: SIMLR.R is the implementation of SIMLR.
- SNN-Cliq: SNN.m and cliq.py are the implementation of SNN-Cliq.
- SpectralClustering: SpectralClustering.py is the implementation of SpectralClustering.
**Dimension_reduction:
- FastICA: FastICA is the implementation of FastICA.
- PCA: PCA is the implementation of PCA.
- t-SNE: t-SNE is the implementation of t-SNE.
- UMAP: UMAP is the implementation of UMAP.
- ZIFA: ZIFA is the implementation of ZIFA.
- SHARP: SHARP is the implementation of SHARP.
- NMF: NMF is the implementation of NMF.
** Evaluation index: Evaluation index.py is the implementation of NMI,ARI,homogeneity and completeness.
** heatplot: heatplot.R is the implementation of heatplot.
** Impute methods:
- DCA: DCA.py is the implementation of DCA.
- Magic: Magic.py is the implementation of Magic.
- SAVER: SAVER.R is the implementation of SAVER.
- scImpute: scImpute.R is the implementation of scImpute.
- CIDR: CIDR.R is the implementation of CIDR.
** Preprocessing: Preprocessing.py is the implementation of DataPreprocessing.
You can download the datasets from the corresponding website. After that, you should prepare the data used in the Preprocessing code according to the steps below. Firstly, imputation data are obtained by autoencoder. Secondly, the imputation data is processed by FastICA. Finally, GaussianMixture clustering model cluster the cells from scRNA-seq data.
Input: data.csv
Output: pred_labels.csv and results.csv
The result.csv contains four evaluation indexes: NMI,ARI,Homogeneity and Completeness.