Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first pass on import/export #174

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
Gemfile.lock
pkg

*.rdb
docs/_site/
docs/.sass-cache/
docs/.jekyll-metadata
test/fixtures/export.yml
9 changes: 3 additions & 6 deletions .rubocop.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
Style/IfUnlessModifier:
MaxLineLength: 150

Metrics/LineLength:
Max: 146

Expand Down Expand Up @@ -28,7 +25,7 @@ Metrics/ClassLength:
SingleLineBlockParams:
Enabled: false

Lint/Eval:
Security/Eval:
Enabled: false

Lint/AssignmentInCondition:
Expand All @@ -37,10 +34,10 @@ Lint/AssignmentInCondition:
SignalException:
Enabled: false

Style/FileName:
Naming/FileName:
Enabled: false

Style/MethodName:
Naming/MethodName:
Enabled: false

Lint/UnusedBlockArgument:
Expand Down
3 changes: 3 additions & 0 deletions lib/classifier-reborn/backends/bayes_memory_backend.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
require_relative 'data_handler'

module ClassifierReborn
class BayesMemoryBackend
attr_reader :total_words, :total_trainings
include DataHandler

# This class provides Memory as the storage backend for the classifier data structures
def initialize
Expand Down
2 changes: 2 additions & 0 deletions lib/classifier-reborn/backends/bayes_redis_backend.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
require_relative 'data_handler'
require_relative 'no_redis_error'
# require redis when we run #intialize. This way only people using this backend
# will need to install and load the backend without having to
Expand All @@ -6,6 +7,7 @@
module ClassifierReborn
# This class provides Redis as the storage backend for the classifier data structures
class BayesRedisBackend
include DataHandler
# The class can be created with the same arguments that the redis gem accepts
# E.g.,
# b = ClassifierReborn::BayesRedisBackend.new
Expand Down
22 changes: 22 additions & 0 deletions lib/classifier-reborn/backends/data_handler.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
module DataHandler
# Read the data and populate the backend in use
def import!(data)
data[:categories].keys.each { |category| add_category(category) }
categories = data[:categories]
categories.each_key do |category|
categories[category].each do |word, diff|
update_category_word_frequency(category, word, diff)
end
end
update_total_words(data[:total_words])
end

def export
{
categories: @categories,
category_counts: @category_counts,
category_word_count: @category_word_count,
total_words: @total_words
}
end
end
22 changes: 21 additions & 1 deletion lib/classifier-reborn/bayes.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Author:: Lucas Carlson (mailto:[email protected])
# Copyright:: Copyright (c) 2005 Lucas Carlson
# License:: LGPL

require 'set'

require_relative 'extensions/tokenizer/whitespace'
Expand Down Expand Up @@ -261,6 +260,27 @@ def reset
populate_initial_categories
end

def import!(data)
@auto_categorize = data[:auto_categorize]
@enable_stemmer = data[:enable_stemmer]
@enable_threshold = data[:enable_threshold]
@initial_categories = data[:categories].keys.map(&:to_s)
@language = data[:language]
@threshold = data[:threshold]
@backend.import!(data)
end

def export
backend_data = @backend.export
{
auto_categorize: @auto_categorize,
enable_stemmer: @enable_stemmer,
enable_threshold: @enable_threshold,
language: @language,
threshold: @threshold
}.merge(backend_data)
end

private

def populate_initial_categories
Expand Down
20 changes: 20 additions & 0 deletions test/bayes/bayesian_common_tests.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# encoding: utf-8
require 'yaml'

module BayesianCommonTests
def test_good_training
Expand Down Expand Up @@ -191,6 +192,25 @@ def test_reset
assert classifier.categories.empty?
end

def test_export
classifier = another_classifier
classifier.train_interesting %"Dutch painting of the Golden Age is included in the general European
period of Baroque painting, and often shows many of its characteristics
most lacks the idealization"
classifier.train_uninteresting %"Grasslands such as savannah and prairie where grasses are dominant are
estimated to constitute forty percent of the land area of the Earth"
exported_data = classifier.export
reference_data = YAML.load(File.read('test/fixtures/reference.yml'))
assert_equal(exported_data, reference_data)
end

def test_import
classifier = ClassifierReborn::Bayes.new backend: @alternate_backend
reference_data = YAML.load(File.read('test/fixtures/reference.yml'))
classifier.import!(reference_data)
assert_equal('Interesting', classifier.classify('Dutch painting of the Golden Age'))
end

private

def another_classifier
Expand Down
49 changes: 49 additions & 0 deletions test/fixtures/reference.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
:auto_categorize: false
:enable_stemmer: true
:enable_threshold: false
:language: en
:threshold: 0.0
:categories:
:Interesting:
:dutch: 1
:paint: 2
:golden: 1
:ag: 1
:includ: 1
:gener: 1
:european: 1
:period: 1
:baroqu: 1
:often: 1
:show: 1
:mani: 1
:it: 1
:characterist: 1
:lack: 1
:ideal: 1
:,: 1
:Uninteresting:
:grassland: 1
:such: 1
:savannah: 1
:prairi: 1
:where: 1
:grass: 1
:domin: 1
:estim: 1
:constitut: 1
:forti: 1
:percent: 1
:land: 1
:area: 1
:earth: 1
:category_counts:
:Interesting:
:training: 1
:word: 18
:Uninteresting:
:training: 1
:word: 14
:category_word_count:
:total_words: 32