A sklearn compatible python package for principal components analysis that includes several methods for PCA rank selection such as random matrix theory based thresholds, Wold style and bi-cross validation, Minka's method, Horn's Parallel Analysis, etc. See here for the list of currently supported rank selection methods as well as the corresponding references.
git clone https://github.com/idc9/pca.git
python setup.py install
from pca.PCA import PCA
from pca.toy_data import rand_factor_model
# sample data from a factor model with 10 PCA components
X = rand_factor_model(n_samples=200, n_features=100,
rank=10, m=2, random_state=1)[0]
# fit PCA and select the rank by thresholding
# the singular values using the Marcenko Pastur distribution
pca = PCA(n_components='rmt_threshold',
rank_sel_kws={'thresh_method': 'mpe'})
pca.fit(X)
Additional documentation, examples and code revisions are coming soon. For questions, issues or feature requests please reach out to Iain: [email protected].
We welcome contributions to make this a stronger package: data examples, bug fixes, spelling errors, new features, etc.