Versioning and validation (or not) in Murmurations fields #35
Photosynthesis
started this conversation in
Library (fields and schemas)
Replies: 1 comment
-
Further background can be found at: #7 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Context
This is a discussion about how (or whether) to include version numbers when identifying fields in the Murmurations library. The typical case for this is when using the JSON-Schema
$ref
to include a library field in a Murmurations schema. If version numbers are included in the file name, the schema references a particular version of the field, with that version's data type and validation rules.Whether fields are versioned is an important aspect of the Murmurations architecture which impacts the development of schemas and the way that interoperable data is handled in the network.
Our initial v0.1 approach included no versioning of fields. In order to allow schema creators and aggregators to maintain more control of their data, fields were versioned starting with v1 of Murmurations. Since then, problematic issues with field versioning have become apparent. This discussion proposes returning using non-versioned fields in the Murmurations library.
Background on this issue:
Why version fields
Problems with field versioning
If a profile references two schemas, and those schemas each reference a different version of the same library field, and the different field versions have incompatible validation rules, the profile will be automatically invalid, regardless of content.
Scenarios where this can occur
Later field version has stricter validation rules
If a field has been updated with a stricter set of validation rules (for example, a shorter max_length value, or removed enum options), data that is valid in the older version could be invalid in the new version. These updates will include a major increment to the field version.
Later field version has less strict validation rules
If a field has been updated with a less strict set of validation rules (for example, a longer max_length value, or added enum options), data that is valid in the new version could be invalid in the old version. These updates will not necessarily include a major increment to the field version, and may still cause profile incompatibility, because the profile may have been created in reference to the schema referencing the newer field version, and then have a schema referencing the older version added to it.
Field versions have mutually incompatible validation rules or data types
If a field has been updated with a new data type (for example a string field that becomes an array or numeric field), profile data added against the two field versions will be mutually incompatible no matter what it contains.
Proposed approach
From the above, it's clear that:
With this in mind, the approach I'm recommending here is:
Issues with limited validation
Nick has made some good points about handling character encodings and potential markup or security vulnerabilities in node data.
Character encodings
Nick mentioned the question of character encodings and the risk of mojibake being generated from Murmurations data. The likely solution to this is that all Murmurations string data should be treated as UTF-8. Since Murmurations data is transmitted (and in some cases stored) as JSON, it should be UTF-8 in any case.
Markup
Markup such as HTML may be accidentally or intentionally included in Murmurations string data. This could cause obvious problems, if users of Murmurations data are not expecting markup. This can cause several different kinds of problems:
This is a significant concern, but there are some inherent limitations to what Murmurations can do directly to mitigate the issues.
On the consumption side, Murmurations data always needs to be treated like any other unsafe data source. Consumers of data should filter Murmurations strings to ensure that they only include plain text, or plain text plus markup that is acceptable to the consuming system. Heuristic processing can be used to minimize visual impact of removing markup (for example by adding
<p>
or other layout tags where line breaks exist in the text string).In general, Murmurations is not expected to be a used for long-form text that requires substantial markup to be readable. (If fields are created for this purpose, field creators could potentially specify that the field is intended to allow HTML, markdown, or some other markup format).
Questions
Some questions:
Beta Was this translation helpful? Give feedback.
All reactions