Skip to content

ADP1904 Typecasting and serialization cleanup

Romans Malinovskis edited this page Apr 24, 2019 · 2 revisions

Problem

As we are in the process of refactoring field classes, we need a clear understanding how typecasting works, where it is done etc.

This document proposes a clear and full explanation for:

  • Typecasting
  • Normalization
  • Serialization

And will look at the following fields:

  • Native supported fields (such as int)
  • Native fields with limited support (such as date or boolean)
  • User-defined fields

Also I'll look at Persistences that may implement custom store for types or not.

Some Considerations

Backwards Compatibility

For the next MAJOR version of ATK Data, there will be a duplication of functionality. The legacy field types defined like this will continue to work as-before:

$model->addField('dob', ['type'=>'date']);

However the new syntax will be encouraged:

$model->addField('dob', ['Date']);

Declaration of Fields

Previously our Fields have clearly used "ui" field for the purposes of the presentation layer. However, no such thing was done for the persistence.

I propose the following:

$model->addField('dob', ['Date', 'persistence'=>['sql'=>['format'=>'Y-m-d']] ]);
// or
$model->addField('dob', ['Date', 'persistence'=>['format'=>'Y-m-d']]);

// consistent with

$model->addField('dob', ['type'=>'date', 'ui'=>['Form'=>..]]);
  • REVIEW: mabye instead of "persistence" other property should be used??

Use of "type" parameter

All the new classes, e.g. Field\Date will declare type='date', which will keep things working with the UI and apps.

Persistence class should be smart enough to keep backward compatible code working, while new Field classes would use the mechanisms described below.

The trigger would be Field->legacy = true, field.

Explanation of workflow and Terms

Agile Data has to deal with several situations:

  • Field value ($field->value) is stored in native PHP type and may be object too.
  • Persistence may opt to use some special storage format, e.g. Y-m-d for SQL or MongoDate.
  • Some persistences may not support native types and user types. Field should provide fallback for that.

So here is a typical workflow.

  1. Field is defined (as shown above).
  2. When value is asigned to field $model['field'] = $value; the value is Normalized.
  3. Field class can cast value toString for example when showing debug info, logging or serializing or exporting.
  4. Field can load value setFromString, for example when un-serializing. Ideally format should be readable and supported by Normalization too, but that's not a requirement. I'll provide examples.
  5. UI persistence (e.g. Template) will rely on toString if it's not familiar with field's type yet it needs to display the value.
  6. When saving into persistence, if it's not familiar with the type, it will also call toString and store that value. Similarly loading will use setFromString.
  7. Persistence may do typecasting of it's own, based on type. It would then convert the value and may also use $field->persistence for additional field specific options.
  8. Field should have NO KNOWLEDGE of persistences.

Persistences

We will create new persistence: ArrayOfStrings, which will store everything in format [ 123=>['field' => 'string', ... ], ...];

We might not use this in practice, but just as an exercise. Persistence Array will store values in native PHP format.

Persistences such as JSON or CSV will make use of ArrayOfString instead.

Serializing

Normally, when saving model data into persistence, it will try to find who can typecast it:

  1. call Persistence::typecastSaveField.. It may handle it or
  2. call Field\TypeFoo::toString. It can be defined by a class or
  3. calls parent Field::toString, which will fall-back to serialize().

Obviously for all the standard types we will define a better strategy for storing, for example:

class Field\String extends Field {
  function toString($value) {
    return $value;
  }
  function setFromString($value) {
    $this->value = $value
  }
}
  • CONSIDERATION: Field::toString may serialize ONLY if value is non-scalar. also Field::setFromString can TRY to unserialize data and if failed, set value as-is.

This process is called fallback serializing and will only use PHP's serialize for field implementation that did not care enough to provide a better way to save into string.

Property $field->serialize

In some cases a custom field may know already that the value cannot be stored, but it may provide a hint on how to store it:

class Field\MyData extends Field {
  public $serialize = 'json';
}

We want to always store this custom data in 'json'. Now when saving into persistence, Serialization will happen first resulting in a "string" type, which will be stored as-is.

Even if persistence support JSON natively, the value will be stored as-string. That's done to make sure that any dangerous contents of $field->value would be hidden away from the database.

You may also define "serialize" for a field explicitly, when defining:

$user->addField('dob', ['Date', 'serialize'=>'json']);

If that's the case, then DOB will NOT be stored in SQL format, but will be json_encode($field->toString())

User-defined type-casting

As before - it's possible for user to define callback for type-casting:

$user->addField('dob', ['Date', 'typecast'=>[$encode_fx, $decode_fx]]);

In this case the typecast takes precedence. Encode is expected to return string which will be used for all persistences and will also be returned by toString.

If you do not wish to affect toString and only apply on persitences:

$user->addField('dob', ['Date', 'persistence'=>['typecast'=>[$encode_fx, $decode_fx]]]);

and finally to set it for a specific persistence only:

$user->addField('dob', ['Date', 'persistence'=>['SQL'=>['typecast'=>[$encode_fx, $decode_fx]]]]);

If you specify callbacks using multiple formats then $field->typecast takes precedence, then persistence['SQL']['typecast'] and then finally persistence['typecats'].

Serialize and Typecast can be specified together, in which case value will be first encoded through typecast and then serialized. However if you specify encode functions through 'persistence' property, then serialize will use default toString.

Allowed serialize values

It's possible to specify custom serializer. In total four options are supported:

  • serialize=false (or null) - don't serialize
  • serialize='json' - uses json_encode(..) and json_decode(.., true), uses arays over stdobjects
  • serialize='rot13', serialize='base64' - supported
  • serialize=true - uses PHP serialize()
  • serialize='any other string' - reserved and will use PHP serialize for now
  • serialize=[$encode_fx, $decode_fx] - uses custom callback for serialization, for example encryption
  • serialize=[$encode_fx] - will encode field but cannot decode (e.g. MD5)

Normalization an setFromString

For all the basic types that ATK Data support (except array), we will make sure that:

$model->getField('dob')->setFromString($string);
// and
$model['dob'] = $string;

would have the same effect. For example Date field will be using ISO date format with timezone. Normalization should be able to properly understand it.

Some type notes

Blob (typeless field)

Currently ATK supports a field without type. This field is left un-changed. The class for handling this field type would be Field\Blob.

Array type

We can't use Array as a class name, so '_' should be added:

$model->addField('data', ['Array_']);

ATK supports ARRAY type now and it can store Scalar array. Since SQL does not support array, it will use Field\Array_::toString like explained above. Because we know it's array of scalar values, we should be OK with json_encode in this method implementation.

However other DB can store array data:

class Persistence\MongoDB {
  function typecastSaveField($value, $field) {
    if ($field->type == 'array') return $value;
  }
}
  • FOR CONSIDERATION, we should probably support Array with non-scalar values, where we would encode it's values. Perhaps ScalarArray instead?

Consideration for Contained data

We are also looking to add data containment in ATK data:

$user->containsMany('login_attempts', Model\LoginAttempt::class);

This would make the following possible:

echo "attempts = ".$user->ref('login_attempts')->action('count')->getOne();

or for adding new attempt:

$user->ref('login_attempts')->insert(['ip'=>$ip]);

In this implementation the reference 'login_attempt' is defined, however it also requires a field to be defined containing the data:

class Model ..
  function containsMany($field, $class) {
    $this->addRef($field, $class);
    $this->addField($field, ['Array_']);
  }
}

class Reference\ContainsMany {
  function ref($m, $field) {
    $data = $m[$field];
    // link with ArrayOfStrings persistence
    // callbacks will update $m[$field] value and call $m->save()
  }

This would allow to store ANY data inside contained model, persistence will typecast everything toString ensuring that $field can be saved as JSON into persistence which supports it.