-
Notifications
You must be signed in to change notification settings - Fork 1
Versioning Scheme
Jan Ehmueller edited this page Jul 27, 2017
·
3 revisions
Entries in the subject
table are versioned on a per-attribute basis. This makes it possible to
- selectively reverse changes (single attributes, entities or whole table) to any previous value
- only export data from certain datasources
- find the program responsible for errors (easier debugging)
- manually edit values and don't let them be automatically overwritten
- define validity parameters (e.g. time duration of validity) for single attributes and relations
Every field in subject
has a corresponding history field (e.g. name
and name_history
).
The type version
is the core data structure for the history fields and represents a change made by a single program on the datalake. It contains the value of that change as well as some meta information:
- the version ID (the same across all changes of a single version)
- validity data (e.g. time duration of validity)
- data sources used in this step
- timestamp of the change
- program that modified this attribute
The version
table is used to identify the latest version of the datalake and can be used in the curation interface to display a history of processes that were run in the past.