Implement external persistent archival of ActiveRecords #46

natacha-beck · 2015-04-15T19:04:12Z

(From Redmine 4311, April 2013)

Sometimes we'd want to extract and save entire sets of rows
from the DB to make Rails faster (especially with the poor
design of some of our table relations).

We can remove rows from the DB, serialize them, save them
in external files and reload them later using this trick:

r = SomeActiveRecordSubclass.find(12345)
r.destroy
File.open("abcd.yaml","w") { |fh| fh.write r.to_yaml }

n_yaml = File.read("abcd.yaml")
n = YAML.load(n_yaml)
n.instance_eval { @new_record = true }  # there is, unfortunately, no API for this.
n.save

This will recreate the original entry with the exact same ID, even
if the SQL table's ID column is set to 'autoincrement'.

One must be careful: if the schema for the table has changed between
the moment the record was saved and reloaded, the process might fail.
Also, if the object has IDs for relationships to other models, the
IDs must be validated. I suggest encapsulating the process of saving
and reloading in a class that makes the necessary verifications (including
saving extra context info to help fix broken associations?)

This entire issue could be rendered moot by redesigning CBRAIN to be
more scale tolerant and truly work well even when hundred of million
of objects are registered in the DB. In that case, no need to archive
the rows.

Updated by Pierre Rioux over 1 year ago

Instead of serializing the row in an external file, we could
store them in an alternate table with the same columns as
the original data model.

Make sure at boot time the two tables have the same schema, e.g. 'userfiles' and 'archived_userfiles'
Set and reset CLASS.table_name to switch from one table to another:

begin
  tablename = SomeActiveRecordSubclass.table_name
  r = SomeActiveRecordSubclass.find(12345)
  r.destroy
  SomeActiveRecordSubclass.table_name = 'archive_#{table_name}'
  n = YAML.load(r.to_yaml)  # maybe not necessary?
  n.instance_eval { @new_record = true }  # there is, unfortunately, no API for this.
  n.save
ensure
  SomeActiveRecordSubclass.table_name = table_name
end

Updated by Pierre Rioux over 1 year ago

Problem with previous comment: how to make sure schema changes are applied to TWO tables
that must be kept in sync?

The text was updated successfully, but these errors were encountered:

prioux · 2016-12-11T16:53:05Z

The only real need is for the userfiles table and the cbrain_tasks table. So maybe we can just maintain one archiving table with its own model, and in it there will be a single attribtue where we serialize the archived entries.

We could do this only for specific old users, and in that case the requirements would be:

all userfiles are stored on a special class of DataProvider (ArchiveCbrainDataProvider < EnCbrainDataProvider ?) which doesn't have Remi's consistency checks (or a modified version of them).
all tasks are already archived as files, or don't have work directories.

The new model would have attributes:

( user_id, userfile_id, cbrain_task_id, yaml_serialized_object )  # where one of userfile_id or cbrain_task_id is always null

The interface would allow the admin toarchive a user (which implies locking the account) or dearchive a user.

Archiving a user is only allowed if the conditions above are true.

Unlocking an account would not be allowed if the user has any archived tasks or files.

natacha-beck added the Enhancement label Apr 15, 2015

natacha-beck assigned prioux Apr 15, 2015

natacha-beck added this to the 4.1.0 milestone Apr 15, 2015

prioux modified the milestones: 4.1.0, 4.2.0 Aug 13, 2015

prioux modified the milestones: 4.2.0, 4.3.0 Nov 3, 2015

natacha-beck modified the milestones: 4.3.0, 4.4.0 Mar 16, 2016

natacha-beck modified the milestones: 4.4.0, 4.5.0 May 31, 2016

prioux added the Priority: Low To implement when someone actually requests it! label Aug 10, 2016

prioux modified the milestones: 4.5.0, 4.6.0 Aug 18, 2016

natacha-beck modified the milestones: 4.6.0, 4.7.0 Nov 21, 2016

prioux removed this from the 4.7.0 milestone Apr 24, 2017

prioux modified the milestones: 5.1.0, 4.7.0 Apr 24, 2017

prioux modified the milestones: 5.1.0, 5.2.0 Nov 16, 2018

prioux modified the milestones: 5.2.0, 5.3.0 Sep 16, 2019

prioux modified the milestones: 5.3.0, 5.4.0, Fake Milestone: Low Priority Issues Dec 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement external persistent archival of ActiveRecords #46

Implement external persistent archival of ActiveRecords #46

natacha-beck commented Apr 15, 2015 •

edited by prioux

Loading

prioux commented Dec 11, 2016 •

edited

Loading

Implement external persistent archival of ActiveRecords #46

Implement external persistent archival of ActiveRecords #46

Comments

natacha-beck commented Apr 15, 2015 • edited by prioux Loading

prioux commented Dec 11, 2016 • edited Loading

natacha-beck commented Apr 15, 2015 •

edited by prioux

Loading

prioux commented Dec 11, 2016 •

edited

Loading