-
-
Notifications
You must be signed in to change notification settings - Fork 195
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Workflow Jobs for text processing and chunking
- Implement ConvertToText, AnonymizeText, and CreateChunks jobs - Update Workflow class to use new job classes
- Loading branch information
Showing
13 changed files
with
258 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# == Schema Information | ||
# Schema version: 20240905062817 | ||
# | ||
# Table name: workflow_jobs | ||
# | ||
# id :bigint not null, primary key | ||
# type :string | ||
# resource_type :string | ||
# resource_id :bigint | ||
# status :integer | ||
# parent_id :bigint | ||
# metadata :jsonb | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
|
||
## | ||
# This class represents a job for anonymizing text using an external command. | ||
# | ||
class Workflow::Jobs::AnonymizeText < Workflow::Job | ||
def perform | ||
file = Tempfile.new | ||
file.write(source) | ||
file.flush | ||
|
||
cmd = [ENV['REDACT_COMMAND'], '--file', file.path].join(' ') | ||
IO.popen(cmd, &:read) | ||
|
||
ensure | ||
file.close | ||
end | ||
|
||
def content_type | ||
'text/plain' | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# == Schema Information | ||
# Schema version: 20240905062817 | ||
# | ||
# Table name: workflow_jobs | ||
# | ||
# id :bigint not null, primary key | ||
# type :string | ||
# resource_type :string | ||
# resource_id :bigint | ||
# status :integer | ||
# parent_id :bigint | ||
# metadata :jsonb | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
|
||
## | ||
# This class is responsible for converting HTML content to plain text. | ||
# | ||
class Workflow::Jobs::ConvertToText < Workflow::Job | ||
include ActionView::Helpers::SanitizeHelper | ||
|
||
def perform | ||
strip_tags(sanitize(source)) | ||
end | ||
|
||
def content_type | ||
'text/plain' | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# == Schema Information | ||
# Schema version: 20240905062817 | ||
# | ||
# Table name: workflow_jobs | ||
# | ||
# id :bigint not null, primary key | ||
# type :string | ||
# resource_type :string | ||
# resource_id :bigint | ||
# status :integer | ||
# parent_id :bigint | ||
# metadata :jsonb | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
|
||
## | ||
# This class represents a job in the workflow system that creates chunks for a resource. | ||
Check warning on line 18 in app/models/workflow/jobs/create_chunks.rb GitHub Actions / build
|
||
# | ||
class Workflow::Jobs::CreateChunks < Workflow::Job | ||
after_destroy :destroy_chunks | ||
|
||
def perform | ||
resource.chunks.create!(text: source).to_gid | ||
end | ||
|
||
def content_type | ||
'application/json' | ||
end | ||
|
||
private | ||
|
||
def destroy_chunks | ||
resource.chunks.destroy_all | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
require 'spec_helper' | ||
|
||
RSpec.describe Workflow::Jobs::AnonymizeText, type: :model do | ||
let(:job) { FactoryBot.build(:anonymize_text) } | ||
|
||
it 'inherits from Workflow::Job' do | ||
expect(job).to be_a(Workflow::Job) | ||
end | ||
|
||
describe '#perform' do | ||
it 'calls an external command to anonymize text' do | ||
allow(IO).to receive(:popen).and_return('Anonymized text') | ||
expect(job.perform).to eq('Anonymized text') | ||
end | ||
end | ||
|
||
describe '#content_type' do | ||
it 'returns the correct content type' do | ||
expect(job.content_type).to eq('text/plain') | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
require 'spec_helper' | ||
|
||
RSpec.describe Workflow::Jobs::ConvertToText, type: :model do | ||
let(:job) { FactoryBot.build(:convert_to_text) } | ||
|
||
it 'inherits from Workflow::Job' do | ||
expect(job).to be_a(Workflow::Job) | ||
end | ||
|
||
describe '#perform' do | ||
it 'converts HTML to plain text' do | ||
job.source = '<p>Hello <strong>World</strong></p>' | ||
expect(job.perform).to eq('Hello World') | ||
end | ||
end | ||
|
||
describe '#content_type' do | ||
it 'returns the correct content type' do | ||
expect(job.content_type).to eq('text/plain') | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
require 'spec_helper' | ||
|
||
RSpec.describe Workflow::Jobs::CreateChunks, type: :model do | ||
let(:job) { FactoryBot.build(:create_chunks) } | ||
|
||
it 'inherits from Workflow::Job' do | ||
expect(job).to be_a(Workflow::Job) | ||
end | ||
|
||
describe '#perform' do | ||
it 'creates a chunk for the resource' do | ||
VCR.use_cassette('test_chunk') do | ||
resource = FactoryBot.create(:foi_attachment) | ||
job.resource = resource | ||
job.source = 'Test chunk' | ||
|
||
expect { job.perform }.to change { resource.chunks.count }.by(1) | ||
end | ||
end | ||
end | ||
|
||
describe '#content_type' do | ||
it 'returns the correct content type' do | ||
expect(job.content_type).to eq('application/json') | ||
end | ||
end | ||
|
||
describe 'callbacks' do | ||
it 'destroys associated chunks when the job is destroyed' do | ||
VCR.use_cassette('test_chunk') do | ||
resource = FactoryBot.create( | ||
:foi_attachment, chunks: [FactoryBot.build(:chunk)] | ||
) | ||
job.resource = resource | ||
job.save! | ||
|
||
expect { job.destroy }.to change { resource.chunks.count }.to(0) | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
VCR.configure do |config| | ||
config.cassette_library_dir = "spec/fixtures/cassettes" | ||
config.hook_into :webmock | ||
config.configure_rspec_metadata! | ||
end |