Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StorageAdapter Versioning Implementation #932

Merged
merged 33 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
b5461b0
Add a `supports?`
tpendragon Aug 30, 2023
eaa5d74
Start speccing out versioning looking at Hyrax
tpendragon Aug 30, 2023
c90e160
Initial storage adapter versioning API for memory adapter.
tpendragon Aug 31, 2023
95f7133
Add versioning questions.
tpendragon Aug 31, 2023
5379c7b
Add notes.
tpendragon Aug 31, 2023
3bdcaee
New implementation + notes
tpendragon Aug 31, 2023
7014327
Add restoring a version and deleting a version.
tpendragon Sep 1, 2023
62c7319
Add purge_versions: true
tpendragon Sep 5, 2023
e8300c1
Clear TODOs
tpendragon Sep 5, 2023
54b3cc1
WIP
tpendragon Sep 5, 2023
e92e903
Back to failing only versioning.
tpendragon Sep 5, 2023
8ebbb33
Implement up until deletion markers.
tpendragon Sep 5, 2023
7490fd4
Passing tests.
tpendragon Sep 6, 2023
e9e349a
Versioned Disk Adapter.
tpendragon Sep 6, 2023
25583fa
Add comment.
tpendragon Sep 6, 2023
97f979f
Coverage
tpendragon Sep 6, 2023
73e90a3
Get Fedora 5 running locally.
tpendragon Sep 13, 2023
c82a3cb
WIP
tpendragon Sep 14, 2023
3ba8e96
Deleting really deletes.
tpendragon Sep 14, 2023
0b21309
Fix for Fedora 5.
tpendragon Sep 14, 2023
2d960e7
Add versioning support to Fedora 4/5/6
tpendragon Sep 14, 2023
bbb8826
Update lib/valkyrie/storage/versioned_disk.rb
tpendragon Sep 14, 2023
056848f
Update lib/valkyrie/storage/versioned_disk.rb
tpendragon Sep 14, 2023
4b58a74
Update lib/valkyrie/storage/versioned_disk.rb
tpendragon Sep 14, 2023
52b006a
Add comment.
tpendragon Sep 14, 2023
cfe6ec6
Rename variable.
tpendragon Sep 14, 2023
65b184f
Actually sleep.
tpendragon Sep 14, 2023
54b5dad
Check for path existence in case test suite runs too fast.
tpendragon Sep 15, 2023
b3e91ae
Only ever sleep once.
tpendragon Sep 15, 2023
ebcce3d
Organize public methods higher and document.
tpendragon Sep 15, 2023
4baf01d
Organize and add comments.
tpendragon Sep 15, 2023
2445134
Remove TODO
tpendragon Sep 15, 2023
3d8bed7
Remove purge_versions
tpendragon Sep 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
environment:
CATALINA_OPTS: "-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms512m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+DisableExplicitGC"
JAVA_OPTIONS: "-Djetty.http.port=8998 -Dfcrepo.dynamic.jms.port=61618 -Dfcrepo.dynamic.stomp.port=61614"
- image: fcrepo/fcrepo:6.0.0
- image: fcrepo/fcrepo:6.4.0
environment:
CATALINA_OPTS: "-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms512m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+DisableExplicitGC -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true"
JAVA_OPTS: "-Djetty.http.port=8978 -Dfcrepo.dynamic.jms.port=61619 -Dfcrepo.dynamic.stomp.port=61615 -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true"
Expand Down
15 changes: 11 additions & 4 deletions .lando.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,27 +24,34 @@ services:
- fedora4:/data
ports:
- 8988:8080
portforward: true
environment:
CATALINA_OPTS: "-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms512m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+DisableExplicitGC"
portforward: 8988
valkyrie_fedora_5:
type: compose
app_mount: false
volumes:
fedora5:
services:
image: samvera/fcrepo4:5.1.0
command: /fedora-entrypoint.sh
image: fcrepo/fcrepo:5.1.1-multiplatform
command:
- "catalina.sh"
- "run"
volumes:
- fedora5:/data
ports:
- 8998:8080
environment:
CATALINA_OPTS: "-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms512m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+DisableExplicitGC -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true"
JAVA_OPTS: "-Dfcrepo.dynamic.jms.port=61620 -Dfcrepo.dynamic.stomp.port=61617 -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true"
portforward: true
valkyrie_fedora_6:
type: compose
app_mount: false
volumes:
fedora6:
services:
image: fcrepo/fcrepo:6.0.0
image: fcrepo/fcrepo:6.4.0
command:
- "catalina.sh"
- "run"
Expand Down
1 change: 1 addition & 0 deletions .rubocop_todo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ Metrics/ClassLength:
- 'lib/valkyrie/persistence/fedora/persister.rb'
- 'lib/valkyrie/persistence/fedora/query_service.rb'
- 'lib/valkyrie/persistence/postgres/query_service.rb'
- 'lib/valkyrie/storage/fedora.rb'

Metrics/MethodLength:
Exclude:
Expand Down
67 changes: 66 additions & 1 deletion lib/valkyrie/specs/shared_specs/storage_adapter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ class Valkyrie::Specs::CustomResource < Valkyrie::Resource
it { is_expected.to respond_to(:find_by).with_keywords(:id) }
it { is_expected.to respond_to(:delete).with_keywords(:id) }
it { is_expected.to respond_to(:upload).with_keywords(:file, :resource, :original_filename) }
it { is_expected.to respond_to(:supports?) }

it "returns false for non-existing features" do
expect(storage_adapter.supports?(:bad_feature_not_real_dont_implement)).to eq false
hackartisan marked this conversation as resolved.
Show resolved Hide resolved
end

it "can upload a file which is just an IO" do
io_file = Tempfile.new('temp_io')
Expand Down Expand Up @@ -50,7 +55,7 @@ def open_files
end

it "can upload, validate, re-fetch, and delete a file" do
resource = Valkyrie::Specs::CustomResource.new(id: "test")
resource = Valkyrie::Specs::CustomResource.new(id: "test#{SecureRandom.uuid}")
sha1 = Digest::SHA1.file(file).to_s
size = file.size
expect(uploaded_file = storage_adapter.upload(file: file, original_filename: 'foo.jpg', resource: resource, fake_upload_argument: true)).to be_kind_of Valkyrie::StorageAdapter::File
Expand All @@ -77,4 +82,64 @@ def open_files
expect { storage_adapter.find_by(id: uploaded_file.id) }.to raise_error Valkyrie::StorageAdapter::FileNotFound
expect { storage_adapter.find_by(id: Valkyrie::ID.new("noexist")) }.to raise_error Valkyrie::StorageAdapter::FileNotFound
end

it "can upload and find new versions" do
pending "Versioning not supported" unless storage_adapter.supports?(:versions)
resource = Valkyrie::Specs::CustomResource.new(id: "test#{SecureRandom.uuid}")
uploaded_file = storage_adapter.upload(file: file, original_filename: 'foo.jpg', resource: resource, fake_upload_argument: true)
expect(uploaded_file.version_id).not_to be_blank

f = Tempfile.new
f.puts "Test File"
f.rewind

# upload_version
new_version = storage_adapter.upload_version(id: uploaded_file.id, file: f)
expect(uploaded_file.id).to eq new_version.id
expect(uploaded_file.version_id).not_to eq new_version.version_id

# find_versions
# Two versions of the same file have the same id, but different version_ids,
# use case: I want to store metadata about a file when it's uploaded as a
# version and refer to it consistently.
versions = storage_adapter.find_versions(id: new_version.id)
expect(versions.length).to eq 2
expect(versions.first.id).to eq new_version.id
expect(versions.first.version_id).to eq new_version.version_id

expect(versions.last.id).to eq uploaded_file.id
expect(versions.last.version_id).to eq uploaded_file.version_id

expect(versions.first.size).not_to eq versions.last.size

expect(storage_adapter.find_by(id: uploaded_file.version_id).version_id).to eq uploaded_file.version_id

# Deleting a version should leave the current versions
if storage_adapter.supports?(:version_deletion)
storage_adapter.delete(id: uploaded_file.version_id)
expect(storage_adapter.find_versions(id: uploaded_file.id).length).to eq 1
expect { storage_adapter.find_by(id: uploaded_file.version_id) }.to raise_error Valkyrie::StorageAdapter::FileNotFound
end
current_length = storage_adapter.find_versions(id: new_version.id).length

# Restoring a previous version is just pumping its file into upload_version
newest_version = storage_adapter.upload_version(file: new_version, id: new_version.id)
expect(newest_version.version_id).not_to eq new_version.id
expect(storage_adapter.find_by(id: newest_version.id).version_id).to eq newest_version.version_id

# I can restore a version twice
newest_version = storage_adapter.upload_version(file: new_version, id: new_version.id)
expect(newest_version.version_id).not_to eq new_version.id
expect(storage_adapter.find_by(id: newest_version.id).version_id).to eq newest_version.version_id
expect(storage_adapter.find_versions(id: newest_version.id).length).to eq current_length + 2

# NOTE: We originally wanted deleting the current record to push it into the
# versions history, but FCRepo 4/5/6 doesn't work that way, so we changed to
# instead make deleting delete everything.
storage_adapter.delete(id: new_version.id)
expect { storage_adapter.find_by(id: new_version.id) }.to raise_error Valkyrie::StorageAdapter::FileNotFound
expect(storage_adapter.find_versions(id: new_version.id).length).to eq 0
ensure
f&.close
end
end
1 change: 1 addition & 0 deletions lib/valkyrie/storage.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ module Valkyrie
# @see lib/valkyrie/specs/shared_specs/storage_adapter.rb
module Storage
require 'valkyrie/storage/disk'
require 'valkyrie/storage/versioned_disk'
require 'valkyrie/storage/fedora'
require 'valkyrie/storage/memory'
end
Expand Down
6 changes: 6 additions & 0 deletions lib/valkyrie/storage/disk.rb
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ def handles?(id:)
id.to_s.start_with?("disk://#{base_path}")
end

# @param feature [Symbol] Feature to test for.
# @return [Boolean] true if the adapter supports the given feature
def supports?(_feature)
false
end

def file_path(id)
id.to_s.gsub(/^disk:\/\//, '')
end
Expand Down
111 changes: 105 additions & 6 deletions lib/valkyrie/storage/fedora.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,20 @@ def handles?(id:)
id.to_s.start_with?(PROTOCOL)
end

# @param feature [Symbol] Feature to test for.
# @return [Boolean] true if the adapter supports the given feature
def supports?(feature)
return true if feature == :versions
return true if feature == :version_deletion && fedora_version != 6
false
end

# Return the file associated with the given identifier
# @param id [Valkyrie::ID]
# @return [Valkyrie::StorageAdapter::StreamFile]
# @raise Valkyrie::StorageAdapter::FileNotFound if nothing is found
def find_by(id:)
Valkyrie::StorageAdapter::StreamFile.new(id: id, io: response(id: id))
perform_find(id: id)
end

# @param file [IO]
Expand All @@ -37,18 +45,104 @@ def find_by(id:)
def upload(file:, original_filename:, resource:, content_type: "application/octet-stream", # rubocop:disable Metrics/ParameterLists
resource_uri_transformer: default_resource_uri_transformer, **_extra_arguments)
identifier = resource_uri_transformer.call(resource, base_url) + '/original'
upload_file(fedora_uri: identifier, io: file, content_type: content_type, original_filename: original_filename)
version_id = current_version_id(id: valkyrie_identifier(uri: identifier)) || mint_version(identifier, latest_version(identifier))
perform_find(id: Valkyrie::ID.new(identifier.to_s.sub(/^.+\/\//, PROTOCOL)), version_id: version_id)
end

def upload_version(id:, file:)
uri = fedora_identifier(id: id)
# Auto versioning is on, so have to sleep if it's too soon after last
# upload.
if fedora_version == 6 && current_version_id(id: id).to_s.split("/").last == Time.current.utc.strftime("%Y%m%d%H%M%S")
sleep(0.5)
return upload_version(id: id, file: file)
end
upload_file(fedora_uri: uri, io: file)
version_id = mint_version(uri, latest_version(uri))
perform_find(id: Valkyrie::ID.new(uri.to_s.sub(/^.+\/\//, PROTOCOL)), version_id: version_id)
end

def upload_file(fedora_uri:, io:, content_type: "application/octet-stream", original_filename: "default")
sha1 = [5, 6].include?(fedora_version) ? "sha" : "sha1"
connection.http.put do |request|
request.url identifier
request.url fedora_uri
request.headers['Content-Type'] = content_type
request.headers['Content-Length'] = file.length.to_s
request.headers['Content-Length'] = io.length.to_s if io.respond_to?(:length)
request.headers['Content-Disposition'] = "attachment; filename=\"#{original_filename}\""
request.headers['digest'] = "#{sha1}=#{Digest::SHA1.file(file)}"
request.headers['digest'] = "#{sha1}=#{Digest::SHA1.file(io)}" if io.respond_to?(:to_str)
request.headers['link'] = "<http://www.w3.org/ns/ldp#NonRDFSource>; rel=\"type\""
io = Faraday::UploadIO.new(file, content_type, original_filename)
io = Faraday::UploadIO.new(io, content_type, original_filename)
request.body = io
end
find_by(id: Valkyrie::ID.new(identifier.to_s.sub(/^.+\/\//, PROTOCOL)))
end

def find_versions(id:)
uri = fedora_identifier(id: id)
version_list = version_list(uri)
version_list.map do |version|
id = valkyrie_identifier(uri: version["@id"])
perform_find(id: id, version_id: id)
end
end

def version_list(fedora_uri)
version_list = connection.http.get do |request|
request.url "#{fedora_uri}/fcr:versions"
request.headers["Accept"] = "application/ld+json"
end
return [] unless version_list.success?
version_graph = JSON.parse(version_list.body)&.first
if fedora_version == 4
version_graph&.fetch("http://fedora.info/definitions/v4/repository#hasVersion", [])
else
version_graph&.fetch("http://www.w3.org/ns/ldp#contains", [])&.sort_by { |x| x["@id"] }&.reverse
end
end

def latest_version(identifier)
return :not_applicable if fedora_version != 4
version_list = version_list(identifier)
return "version1" if version_list.blank?
last_version = version_list.first["@id"]
last_version_number = last_version.split("/").last.gsub("version", "").to_i
"version#{last_version_number + 1}"
end

# @param [Valkyrie::ID] id A storage ID that's not a version, to get the
# version ID of.
def current_version_id(id:)
version_list = version_list(fedora_identifier(id: id))
return nil if version_list.blank?
valkyrie_identifier(uri: version_list.first["@id"])
end

def perform_find(id:, version_id: nil)
current_id = Valkyrie::ID.new(id.to_s.split("/fcr:versions").first)
version_id ||= id if id != current_id
# No version got passed and we're asking for a current_id, gotta get the
# version ID
return perform_find(id: current_id, version_id: (current_version_id(id: id) || :empty)) if version_id.nil?
Valkyrie::StorageAdapter::StreamFile.new(id: current_id, io: response(id: id), version_id: version_id)
end

# @param identifier [String] Fedora URI to mint a version for.
# @return [Valkyrie::ID] version_id of the minted version.
# Versions are created AFTER content is uploaded.
def mint_version(identifier, version_name = "version1")
response = connection.http.post do |request|
request.url "#{identifier}/fcr:versions"
request.headers['Slug'] = version_name if fedora_version == 4
end
# If there's a deletion marker, don't return anything.
return nil if response.status == 410
# This is awful, but versioning is locked to per-second increments.
if response.status == 409
sleep(0.5)
return mint_version(identifier, version_name)
end
raise "Version unable to be created" unless response.status == 201
valkyrie_identifier(uri: response.headers["location"].gsub("/fcr:metadata", ""))
end

# Delete the file in Fedora associated with the given identifier.
Expand Down Expand Up @@ -81,6 +175,11 @@ def fedora_identifier(id:)
RDF::URI(identifier)
end

def valkyrie_identifier(uri:)
id = uri.to_s.sub("http://", PROTOCOL)
Valkyrie::ID.new(id)
end

private

# @return [IOProxy]
Expand Down
68 changes: 64 additions & 4 deletions lib/valkyrie/storage/memory.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,51 @@ def initialize
# @return [Valkyrie::StorageAdapter::StreamFile]
def upload(file:, original_filename:, resource: nil, **_extra_arguments)
identifier = Valkyrie::ID.new("memory://#{resource.id}")
cache[identifier] = Valkyrie::StorageAdapter::StreamFile.new(id: identifier, io: file)
version_id = Valkyrie::ID.new("#{identifier}##{SecureRandom.uuid}")
cache[identifier] ||= {}
cache[identifier][:current] = Valkyrie::StorageAdapter::StreamFile.new(id: identifier, io: file, version_id: version_id)
end

# @param file [IO]
# @param original_filename [String]
# @param previous_version_id [Valkyrie::ID]
# @param _extra_arguments [Hash] additional arguments which may be passed to
# other adapters.
# @return [Valkyrie::StorageAdapter::StreamFile]
def upload_version(id:, file:)
# Get previous file and add a UUID to the end of it.
new_file = Valkyrie::StorageAdapter::StreamFile.new(id: id, io: file, version_id: Valkyrie::ID.new("#{id}##{SecureRandom.uuid}"))
current_file = cache[id][:current]
cache[id][:current] = new_file
cache[id][:versions] ||= []
cache[id][:versions].prepend(current_file) if current_file
new_file
end

# @param id [Valkyrie::ID]
# @return [Array<Valkyrie::StorageAdapter::StreamFile>]
def find_versions(id:)
return [] if cache[id].nil?
[cache[id][:current] || nil].compact + cache[id].fetch(:versions, [])
end

# Return the file associated with the given identifier
# @param id [Valkyrie::ID]
# @return [Valkyrie::StorageAdapter::StreamFile]
# @raise Valkyrie::StorageAdapter::FileNotFound if nothing is found
def find_by(id:)
raise Valkyrie::StorageAdapter::FileNotFound unless cache[id]
cache[id]
no_version_id, _version = id_and_version(id)
raise Valkyrie::StorageAdapter::FileNotFound unless cache[no_version_id]
version =
if id == no_version_id
cache[id][:current]
else
find_versions(id: no_version_id).find do |file|
file.version_id == id
end
end
raise Valkyrie::StorageAdapter::FileNotFound unless version
version
end

# @param id [Valkyrie::ID]
Expand All @@ -35,10 +70,35 @@ def handles?(id:)
id.to_s.start_with?("memory://")
end

# @param feature [Symbol] Feature to test for.
# @return [Boolean] true if the adapter supports the given feature
def supports?(feature)
case feature
when :versions
true
when :version_deletion
true
else
false
end
end

def id_and_version(id)
id, version = id.to_s.split("#")
[Valkyrie::ID.new(id), version]
end

# Delete the file on disk associated with the given identifier.
# @param id [Valkyrie::ID]
def delete(id:)
cache.delete(id)
base_id, version = id_and_version(id)
if version && cache[base_id][:current]&.version_id != id
cache[base_id][:versions].reject! do |file|
file.version_id == id
end
else
cache.delete(base_id)
end
nil
end
end
Expand Down
Loading