a cpp implementation of sparse biterm topic model, 10x faster than origin implementation because using sparse-gibbs-sampler.
- being suitable to model for user-click-sequenece(Rcommandation System) or short-text(NLP), because it assume that adjacent N-items belong to a topic;
- using sparse-gibbs-sampler, 10x faster than origin implementation;
Biterm Topic Model (Sparse-Sampler)
Parameters:
- -input
path of docs file, lines of file look like "word1 word2 word3 ... \n" - -output
dir of model(topic_biterm_sum, topic_word) file - -num_topics
number of topics - -alpha
symmetric doc-topic prior probability, default is 0.05 - -beta
symmetric topic-word prior probability, default is 0.01 - -window_size
window size for biterms, default is 2 - -num_iters
number of iteration, default is 20 - -save_step
save model every save_step iteration, default is -1 (no save)
./sparse_btm -input short_text.txt -output model_out/ -num_topics 100 -window_size 3 -num-iters 20 -save_step 10