-
Notifications
You must be signed in to change notification settings - Fork 1
Versioning Scheme
Entries in the subject
are versioned on a per-attribute basis. This makes it possible to
- selectively reverse changes (single attributes, entities or whole table) to any previous value
- only export data from certain datasources
- find the program responsible for errors (easier debugging)
- manually edit values and don't let them be automatically overwritten
- define validity parameters (e.g. time duration of validity) for single attributes and relations
Every field in subject
has a corresponding history field (e.g. name
and name_history
).
The type version
is the core data structure for the history fields and represents a change made by a single program on the datalake. It contains the value of that change as well as some meta information:
- the version ID (the same across all changes of a single version)
- validity data (e.g. time duration of validity)
- data sources used in this step
- timestamp of the change
- program that modified this attribute
This is the CQL command used to create the version
UDT:
create type datalake.version(
version timeuuid,
value list,
validity map<text, text>,
datasources list,
timestamp timestamp,
program text
);
The version
table is used to identify the latest version of the datalake and can be used in the curation interface to display a history of processes that were run in the past.
This is the CQL command used to create the version
table:
create table datalake.version(
version timeuuid primary key,
timestamp timestamp,
datasources list<text>,
program text
);