SpectralLDA

Note: This is the single-host version, for the up-to-date and distributed version please refer to [https://github.com/Mega-DatA-Lab/SpectralLDA-Spark].

This code implements a Spectral (third order tensor decomposition) learning method for the Latent Dirichlet Allocation model in Python.

The Spectral learning method works with empirical counts of word pair or word triplet from any document in the dataset. We average the counts and put them in tensors. We then perform tensor decomposition to learn the Latent Dirichlet Allocation model. For more details, please refer to report.pdf in the repository.

Usage

Invoke spectral_lda with the doc-term count matrix. At output we'd learn alpha for the Dirichlet prior parameter, beta for the topic-word-distribution, with one topic per column.

# docs is the doc-term count matrix
# alpha0 is the sum of the Dirichlet prior parameter
# k is the rank aka number of topics
from spectral_lda import spectral_lda
alpha, beta = spectral_lda(docs, alpha0=<alpha0>, k=<k>, l1_simplex_proj=False)

# alpha is the learnt Dirichlet prior
# beta is the topic-word-distribution matrix
# with one column per topic

By default each column in beta may not sum to one, set l1_simplex_proj=True to perform post-processing that projects beta into the l1-simplex.

References

Anandkumar, Animashree, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky, Tensor Decompositions for Learning Latent Variable Models.

Name	Name	Last commit message	Last commit date
Latest commit jli05 add reference to SpectralLDA-Spark Jun 22, 2018 3f1dfd6 · Jun 22, 2018 History 49 Commits
.gitignore	.gitignore	Use TensorLy for CP Decomposition (#1 )	Feb 10, 2018
LICENSE	LICENSE	Initial commit	Jan 29, 2017
README.md	README.md	add reference to SpectralLDA-Spark	Jun 22, 2018
cumulants.py	cumulants.py	Use TensorLy for CP Decomposition (#1 )	Feb 10, 2018
proj_l1_simplex.py	proj_l1_simplex.py	remove returning theta from proj_l1_simplex()	Apr 23, 2017
rand_svd.py	rand_svd.py	fix augmented rank k for rand_svd	Apr 30, 2017
report.pdf	report.pdf	add report	Apr 24, 2017
spectral_lda.py	spectral_lda.py	Use TensorLy for CP Decomposition (#1 )	Feb 10, 2018
test_cumulants.py	test_cumulants.py	simplify tests for cumulants.py	May 7, 2017
test_proj_l1_simplex.py	test_proj_l1_simplex.py	remove returning theta from proj_l1_simplex()	Apr 23, 2017
test_rand_svd.py	test_rand_svd.py	fix test name for test_rand_svd	Apr 22, 2017
test_spectral_lda.py	test_spectral_lda.py	use uniformly random topic-word distributions for test	Apr 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpectralLDA

Usage

References

About

Releases

Packages

Languages

License

Mega-DatA-Lab/SpectralLDA

Folders and files

Latest commit

History

Repository files navigation

SpectralLDA

Usage

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages