You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Julia! I'm big fan of the tidy text mining book, but it seems it does not have too much emphasis on how to tune the number of topics (K) in a LDA model, or comparisons of LDA of different K. I find the package ldatuning quite helpful . Would you be interested in implement a wrapper or a similar function in the tidytext package?
The text was updated successfully, but these errors were encountered:
I have been moving away from using the topicmodels package in favor of the stm package for topic modeling, for a variety of reasons (speed, ease of use, document-level covariates, etc) so I'd be more interested pursuing options in that direction. In 2018 I published this blog post showing how to set up training many models at different values for K, similar to stm's own searchK() function but allowing for more detailed exploration of results. It uses functions from tidytext (the stm tidiers and such) already.
So this is possible already but does require folks to directly use purrr::map() and friends, along with the functions that calculate metrics such as semantic coherence. There are benefits to that (people get a chance to know what they're dealing with) but perhaps there would be upside to creating something that has less of a barrier to get started. It would work more directly like stm::searchK(), I guess, but return a tibble something like:
Hi Julia! I'm big fan of the tidy text mining book, but it seems it does not have too much emphasis on how to tune the number of topics (K) in a LDA model, or comparisons of LDA of different K. I find the package ldatuning quite helpful . Would you be interested in implement a wrapper or a similar function in the tidytext package?
The text was updated successfully, but these errors were encountered: