Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement external persistent archival of ActiveRecords #46

Open
natacha-beck opened this issue Apr 15, 2015 · 1 comment
Open

Implement external persistent archival of ActiveRecords #46

natacha-beck opened this issue Apr 15, 2015 · 1 comment
Assignees
Labels
Enhancement Priority: Low To implement when someone actually requests it!

Comments

@natacha-beck
Copy link
Contributor

natacha-beck commented Apr 15, 2015

(From Redmine 4311, April 2013)

Sometimes we'd want to extract and save entire sets of rows
from the DB to make Rails faster (especially with the poor
design of some of our table relations).

We can remove rows from the DB, serialize them, save them
in external files and reload them later using this trick:

r = SomeActiveRecordSubclass.find(12345)
r.destroy
File.open("abcd.yaml","w") { |fh| fh.write r.to_yaml }

n_yaml = File.read("abcd.yaml")
n = YAML.load(n_yaml)
n.instance_eval { @new_record = true }  # there is, unfortunately, no API for this.
n.save

This will recreate the original entry with the exact same ID, even
if the SQL table's ID column is set to 'autoincrement'.

One must be careful: if the schema for the table has changed between
the moment the record was saved and reloaded, the process might fail.
Also, if the object has IDs for relationships to other models, the
IDs must be validated. I suggest encapsulating the process of saving
and reloading in a class that makes the necessary verifications (including
saving extra context info to help fix broken associations?)

This entire issue could be rendered moot by redesigning CBRAIN to be
more scale tolerant and truly work well even when hundred of million
of objects are registered in the DB. In that case, no need to archive
the rows.

  • Updated by Pierre Rioux over 1 year ago

Instead of serializing the row in an external file, we could
store them in an alternate table with the same columns as
the original data model.

Make sure at boot time the two tables have the same schema, e.g. 'userfiles' and 'archived_userfiles'
Set and reset CLASS.table_name to switch from one table to another:
begin
  tablename = SomeActiveRecordSubclass.table_name
  r = SomeActiveRecordSubclass.find(12345)
  r.destroy
  SomeActiveRecordSubclass.table_name = 'archive_#{table_name}'
  n = YAML.load(r.to_yaml)  # maybe not necessary?
  n.instance_eval { @new_record = true }  # there is, unfortunately, no API for this.
  n.save
ensure
  SomeActiveRecordSubclass.table_name = table_name
end
  • Updated by Pierre Rioux over 1 year ago

Problem with previous comment: how to make sure schema changes are applied to TWO tables
that must be kept in sync?

@natacha-beck natacha-beck added this to the 4.1.0 milestone Apr 15, 2015
@prioux prioux modified the milestones: 4.1.0, 4.2.0 Aug 13, 2015
@prioux prioux modified the milestones: 4.2.0, 4.3.0 Nov 3, 2015
@natacha-beck natacha-beck modified the milestones: 4.3.0, 4.4.0 Mar 16, 2016
@natacha-beck natacha-beck modified the milestones: 4.4.0, 4.5.0 May 31, 2016
@prioux prioux added the Priority: Low To implement when someone actually requests it! label Aug 10, 2016
@prioux prioux modified the milestones: 4.5.0, 4.6.0 Aug 18, 2016
@natacha-beck natacha-beck modified the milestones: 4.6.0, 4.7.0 Nov 21, 2016
@prioux
Copy link
Member

prioux commented Dec 11, 2016

The only real need is for the userfiles table and the cbrain_tasks table. So maybe we can just maintain one archiving table with its own model, and in it there will be a single attribtue where we serialize the archived entries.

We could do this only for specific old users, and in that case the requirements would be:

  • all userfiles are stored on a special class of DataProvider (ArchiveCbrainDataProvider < EnCbrainDataProvider ?) which doesn't have Remi's consistency checks (or a modified version of them).
  • all tasks are already archived as files, or don't have work directories.

The new model would have attributes:

( user_id, userfile_id, cbrain_task_id, yaml_serialized_object )  # where one of userfile_id or cbrain_task_id is always null

The interface would allow the admin toarchive a user (which implies locking the account) or dearchive a user.

Archiving a user is only allowed if the conditions above are true.

Unlocking an account would not be allowed if the user has any archived tasks or files.

@prioux prioux removed this from the 4.7.0 milestone Apr 24, 2017
@prioux prioux modified the milestones: 5.1.0, 4.7.0 Apr 24, 2017
@prioux prioux modified the milestones: 5.1.0, 5.2.0 Nov 16, 2018
@prioux prioux modified the milestones: 5.2.0, 5.3.0 Sep 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Priority: Low To implement when someone actually requests it!
Projects
None yet
Development

No branches or pull requests

2 participants