-
Notifications
You must be signed in to change notification settings - Fork 174
Polymorphism
As of 0.6.0, MessagePack for CLI supports polymorphism for objects and collection items.
Actually, there are 2 kinds of polymorphism: Known subtypes based polymorphism and Runtime type based polymorphism.
There are several use cases of polymorphism.
- Serialize polymorphic collections (issue #58). Sometimes you want to serialize heterogeneous collection items.
- Serialize 'rich' domain model which has own data and logic (issue #47). You can deserialize objects and invoke their virtual methods.
- Pros
- Easy to interop. Known Subtypes Based Polymorphism uses a simple format, so you can easily implement counterpart system.
- Naturally secure. You can control possible instance types via custom attributes, so there are fewer chance to inject malicious code except you also download untrusted assembly.
- Cons
- You must continuously maintain known subtype list(s).
- All types must be known at compilation time.
- Pros
- Easy to use and maintain. You just need to put some custom attributes to the members.
- You don't have to know possible subtypes at compilation time.
- Cons
- It uses native .NET type identifier based format, so it is hard to keep interoperability because other systems must interpret the information and translate them to their own type system requirement.
- You cannot control possible subtypes, it might hurt stability of your application.
- If serialized data comes from external source, the data may contain malicious type information. Attackers can specify special type(s) which has default constructor which causes significant side effects like file/registry manipulation etc.
You can specify a member (field/property) is polymorphic by marking it with custom attributes like following:
// Known subtypes based polymorphism.
[MessagePackKnownType( 0, typeof( FileInfo ) )]
[MessagePackKnownType( 1, typeof( DirectoryInfo ) )]
public FileSystemInfo Info { get; set; }
// Runtime type based polymorphism.
[MessagePackRuntimeType]
public object Data { get; set; }
As you imagine, you cannot mix multiple polymorphism custom attribute to the member.
You can also specify polymorphism to collections themselves, each collection items, each dictionary keys/values, and each Tuple
items.
These tables show valid combination and meanings of the attributes:
Attribute | Target | Note |
---|---|---|
MessagePackKnownTypeAttribute |
Noncollection objects or Collections themselves | |
MessagePackKnownCollectionItemTypeAttribute |
Collection items or Dictionary values | For example, items of List<object> typed property value. |
MessagePackKnownDictionaryKeyTypeAttribute |
Dictionary keys | For example, keys of Dictionary<object, object> typed property value. |
MessagePackKnownTupleItemTypeAttribute |
An item of tuples | every attribute specifies an item (Nth attribute for ItemN property). |
Attribute | Target | Note |
---|---|---|
MessagePackRuntimeTypeAttribute |
Noncollection objects or Collections them selves | |
MessagePackRuntimeCollectionItemTypeAttribute |
Collection items or Dictionary values | For example, items of List<object> typed property value. |
MessagePackRuntimeDictionaryKeyTypeAttribute |
Dictionary keys | For example, keys of Dictionary<object, object> typed property value. |
MessagePackRuntimeTupleItemTypeAttribute |
An item of tuples | every attribute specifies an item (Nth attribute for ItemN property). |
As you see, you can specify both of collections themselves are polymorphic and their keys/items are polymorphic for collection typed (that is, the type implements IEnumerable
, but not IDictionary
and not sealed) or dictionary typed members. In addition, you can specify polymorphic to tuple item(s). Note that System.Tuple
s are sealed, so you cannot specify a Tuple
typed member itself is polymorphic.
For the remainder, there are default behaviors for collection and System.Object
typed members.
-
System.Object
means boxedMessagePackObject
when the member is not marked with any polymorphic attributes. - Deserialized abstract collection typed member value is determined by
SerializationContext.DefaultCollectionTypes
registration. Defaults areList<T>
andDictionary<TKey, TValue>
.
This section discusses about type information format to develop interoperable implementation.
- Objects' type information will be serialized together with their values.
- The type information and values are serialized within single array.
- The type information consists of their data.
- The type information itself will be encoded in an array.
It will be encoded as a simple 2 elements array.
[<StringTypeCode>, <Data>]
In above figure, "StringTypeCode" is the type code string specified in the custom attributes. It will be encoded as MessagePack str
(raw
) format. It should be encoded as compact as possible. "Data" is the serialized object value and its form will be array or map.
It will be encoded as a 2 elements array.
[<EncodedNETType>, <Data>]
In above figure, "Data" is the serialized object value and its form will be array or map. The "EncodedNETType" is a 6 element array formatted structured data and it is equivalent to .NET type name with assembly qualified name. This table shows contents of the structured type information and mapping between type qualified name and the structured data:
Index | Type | Content |
---|---|---|
0 | integer |
Format ID. Only 1 is valid. Discussed later. |
1 |
str (raw ) |
Compressed type full name. Discussed later. |
2 |
str (raw ) |
Assembly's simple name. |
3 | array |
Assembly's version with 4 element int array. |
4 |
str (raw ) |
Assembly's culture name. nil for neutral assembly. |
5 |
bin (raw ) |
Assembly's public key token. nil for null . |
Note that the Format ID 1
means this format uses "Compressed Format". This format compresses the type name. Because many type owns the prefix as namespace, and the prefix often matches its declaring assembly simple name, we can save space with omit the duplicated substring. The format replaces such prefix with '.'.
For example, the type which is "TheCompany.TheProduct.TheComponent.TheLayer.TheType, TheCompany.TheProduct.TheComponent, Version=1.2.3.4, Culture=neutral, PublicKeyToken=null", then the result type information logically should be following:
[1, ".TheLayer.TheType", "TheCompany.TheProduct.TheComponent", [1, 2, 3, 4], nil, nil]
The physical format looks like following:
0x96 0x01 0xB12E5468654C617965722E54686554797065 0xD922546865436F6D70616E792E54686550726F647563742E546865436F6D706F6E656E74 0x94 0x01 0x02 0x03 0x04 0xC0 0xC0
It is 63 bytes binary instead of 142 bytes UTF-8 encoded string.