Skip to content

Commit

Permalink
Fix typo in FAQ (#1542)
Browse files Browse the repository at this point in the history
  • Loading branch information
jsektkuehler authored Sep 24, 2023
1 parent ebcfc74 commit 244215a
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ but a category of outliers.
## **I have too many topics, how do I decrease them?**
If you have a large dataset, then it is possible to generate thousands of topics. Especially with large datasets, there is a good chance they contain many small topics. In practice, you might want a few hundred topics at most to interpret them nicely.

There are a few ways of increasing the number of generated topics:
There are a few ways of decreasing the number of generated topics:

* First, we can set the `min_topic_size` in the BERTopic initialization much higher (e.g., 300) to make sure that those small clusters will not be generated. This is an HDBSCAN parameter that specifies the minimum number of documents needed in a cluster. More documents in a cluster mean fewer topics will be generated.

Expand Down Expand Up @@ -310,4 +310,4 @@ No. By using document embeddings there is typically no need to preprocess the da
are important in understanding the general topic of the document. Although this holds in 99% of cases, if you
have data that contains a lot of noise, for example, HTML-tags, then it would be best to remove them. HTML-tags
typically do not contribute to the meaning of a document and should therefore be removed. However, if you apply
topic modeling to HTML-code to extract topics of code, then it becomes important.
topic modeling to HTML-code to extract topics of code, then it becomes important.

0 comments on commit 244215a

Please sign in to comment.