- Separate tokenizer from hasher, allowing custom tokenizers. (#162)
- Improved handling of Zero division and Zero vector normalization. (#173)
- remove deprecated has_rdoc in gemspec
- auto-gen-config for Rubocop
- auto-correct offenses
- JRuby Support, thanks to @mach-kernel (#168)
- Add support to reset trained classifiers to their initial state (#143)
- Classifier evaluation and validation (#142)
- Abbility to add custom stopwords at classifier initialization (#129)
- Don't train/untrain the Bayesian classifier with empty word hashes (#132)
- Enable auto categorization if no initial categories (#128)
- Bayes integration test of Memory and Redis backends with real data (#92)
- Memory and Redis backend support (#84)
- improved turkish stopwords (#159)
- Set Redis keys only if they don't exist (#156)
- Require bayes_redis_backend (#157)
- Validation documentation improvements (#150)
- Updated Dokcer image to Ruby 2.4 (#149)
- Classifier validation user documenation (#145)
- Fixed persistance for BayesMemoryBackend (#147)
- Fixed error on requiring 'classifier-reborn' without using Redis (#146)
- Removed magic train untrain methods from docs, (#141)
- Links corrected to point to the new domain (#139)
- Minor docs improvements (#138)
- Return the status of the training/untraining when run (#137)
- Refactoring of backend tests to move duplicate login in the common file (#134)
- Deal with Infinity score in test (#133)
- README file cleaned up to point to the documentation site (#121)
- Added and corrected RDoc for ceratin classes and methods (#122)
- Added favicon link and forced display (#120)
- Updated the truncated LICENSE file (#116)
- Docs visual improvement and refactoring (#119)
- Fixed relative URL issue on nav links and added benchmark data (#118)
- Added custom layout with navigation (#117)
- Created a static site for documentation (#115)
- Removed redis gem from Dockerfile as it is added in gemspec (#113)
- Speed up Docker image rebilding (#112)
- Improved Docker based development documentation (#106)
- Benchmark refactoring, improving efficiency, enhanced reporting (#107)
- Add Vietnamese stopwords (#110)
- Added stop words for Arabic, Bengali, Chinese, Hindi, and Russian (#105)
- Dockerfile and documentation (#104)
- Remove hard dep on Redis and update bin (#96)
- Documented Redis backend performance (#103)
- Rename Bayes memory test class (#102)
- Added Bayes backend benchmarks (#98)
- Disabled Redis disc persistence and refactored integration test (#97)
- Removed useless intermediate variables (#90)
- Fix breaking changes in LSI api. Displays errors instead of raising where possible. #87
- Stopwords get encoded to utf8 (#83)
- Fix searching issues where no document is added to lsi (#77)
- Added method to add custom path to user-created stopword directory (#73)
- Test newer rubies (#85)
- Fixed errors in README (#68, #79, #80)
- Added an option to the bayesian classifier to disable word stemming (#61)
- Added missing parens and renamed some variables (#59)
- Classification thresholds can be enabled or disabled. The default is disabled. The threshold value can be set at initialization time or dynamically during processing (#47)
- Made auto-categorization optional, defaulting to false (#45)
- Added the ability to handle an array of classifications to the constructor (#44)
- Classification with a threshold has been added to the api (#39)
- Documentation around threshold usage (#54)
- Fixed UTF-8 encoding for
hasher.rb
(#50) - Removed some unnecessary methods (#43)
- Add optional
CachedContentNode
(GSL only) (#43) - Caches the transposed
search_vector
(#43) - Added custom marshal_ methods to not save the cache when dumping/loading (#43)
- Optimized some numeric comparisons and iterators (#43)
- Added cached calculation table when computing raw_vectors (#43)
- If a category name is already a symbol, just return it (#45)
- Various Hash improvements (#45)
- Eliminated several Ruby
⚠️ s when run with RUBYOPT="-w" (#38) - Simple performance improvements for the Hasher process (#41)
- Fixes for broken regex splitting for non-ascii characters and removal of the unused punctuation filter (#41)
- Add multiple language stopwords with customizable stop word paths (#40)
- Fixed the bug where adding the same category a second time would clobber the category that was already there (#45)
- Fixed deprecation warning for
<=>
in ls.rb (#33) - Remove references to Madeline in the README and replace it with Marshal or Redis (#32)
- Added development dependency on
mini_test
and added 2.2 to travis.yml (#36)
- Handle
GSL::Vector
's which don't have#reduce
inContentNode#raw_vector_with
(#28) - Remove unnecessary
Vector
monkey-patch (#29)
- Remove
Array#sum
monkey patch in favour of#reduce(0, :+)
(#20) - Cache total word counts per category for speed (#4)
- Add a test for
Bayes#untrain_*
. (#21) - Fix link to rb-gsl gem (#24)
- Add helper scripts per Jekyll convention (#25)
- Replace
Object
monkey patch withCategoryNamer
method (#18) - Count total unique words using methods supported by
Vector
andGSL::Vector
(#11)
- Remove
stats
rake task (#17) - Add some tests for
ClassifierRebord::WordList
(#15)
- Remove mathn dependency (#8)
- Only perform first order transform if total UNIQUE words is greater than 1 (#3)
- Update
LSI#remove_item
such that they will work with the@items
hash. (#2)
- Exclude Gemfile.lock in .gitignore (#7)