How does BTM create its biterm sets? #10

LjessonS · 2017-03-15T11:48:43Z

Recently, I'm interested in your idea of model data on word-pairs in a document for short texts, but I'm a bit of confused at how you count the biterm sets in BTM. You did a nice job to implement it in C++, but I'm not good at it, and feel hard to read c++ code. I wonder if counts of every word-pairs within a document is one, and the biterm vector of the whole biterm sets can be updated by calculating the word pairs from document to document. Wish you to answer my puzzle. Thank you very much!

xiaohuiyan · 2017-03-29T15:44:29Z

Not exactly right. A biterm is defined as a pair of words co-occurring in the same text window. For example,
a doc is "A B C B ", and suppose the window size=3, so their are two text windows which can generate biterms as follows:

text window "A B C" => "A B", "B C", "A C"
text window "B C B" => "B C", "C B", "B B"
Since a biterm is an unorder word pair, "B C"="C B". Thus, the doc will count the biterm "B C" 3 times, and the biterms "A B", "A C", "B B" 1 time.

PS: Thanks to other contributors, you can find the implementation of BTM with other language (e.g, python, julia, scala) on github :)

himanshi-sinha · 2017-08-30T09:49:14Z

Hi could you please provide the link for the python implementation for BTM.

rtrad89 · 2020-07-07T12:43:10Z

Hi could you please provide the link for the python implementation for BTM.

Here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does BTM create its biterm sets? #10

How does BTM create its biterm sets? #10

LjessonS commented Mar 15, 2017

xiaohuiyan commented Mar 29, 2017

himanshi-sinha commented Aug 30, 2017

rtrad89 commented Jul 7, 2020

How does BTM create its biterm sets? #10

How does BTM create its biterm sets? #10

Comments

LjessonS commented Mar 15, 2017

xiaohuiyan commented Mar 29, 2017

himanshi-sinha commented Aug 30, 2017

rtrad89 commented Jul 7, 2020