You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the library ships with a list of stopwords in various languages. PR #73 adds the ability to specify more directories to look for stopwords. This means one can only add more stopwords, but can't overwrite it, except. perhaps by setting the value of Hasher.STOPWORDS. However, a stopwords list does not suit in all situations, for sepcial purpose collections stopwords are differnt in the same language. And in some case stopwords are not desired at all.
The current implementation also strongly relies on the name of the file being the language code.
In reality, one classifier instance is only tied with one language and each classifier may want to use its own stopwords. It would be nice to be able to pass an array of stopwords or an arbitrary file path during the initialization of the classifier that can overwrite the value of Hasher.STOPWORDS[@language]. I should be able to make a PR for this if we decide to go for it.
The text was updated successfully, but these errors were encountered:
* Abbility to add custom stopwords at classifier initialization
* Downcased custom test stopwords
* Documented and improved custom stopwords handling
* Added test cases for custom stopwords and empty trainings, #125 and #130
* Added documentation for auto-categorization and custom stopwords
Currently, the library ships with a list of stopwords in various languages. PR #73 adds the ability to specify more directories to look for stopwords. This means one can only add more stopwords, but can't overwrite it, except. perhaps by setting the value of
Hasher.STOPWORDS
. However, a stopwords list does not suit in all situations, for sepcial purpose collections stopwords are differnt in the same language. And in some case stopwords are not desired at all.The current implementation also strongly relies on the name of the file being the language code.
In reality, one classifier instance is only tied with one language and each classifier may want to use its own stopwords. It would be nice to be able to pass an array of stopwords or an arbitrary file path during the initialization of the classifier that can overwrite the value of
Hasher.STOPWORDS[@language]
. I should be able to make a PR for this if we decide to go for it.The text was updated successfully, but these errors were encountered: