Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading/writing objects with ReadOnlyMemory<byte> properties #228

Closed
bill-poole opened this issue Apr 4, 2024 · 16 comments
Assignees
Labels
enhancement New feature or request suggestion

Comments

@bill-poole
Copy link

I'm new to using EfficientDynamoDb, so apologies if I have this wrong, but I've been through the documentation and code and don't think its currently possible to read/write objects with binary properties of type ReadOnlyMemory<byte>. I need to write buffers to DynamoDB that have been produced by compression, and the data is written into a buffer that is larger than the final compressed data (which is typical for writing compressed data).

Currently, it seems I need to copy this data into a new byte array so I can set a byte[] property on the object I want to write to DynamoDB.

I propose modifying the BinaryAttributeValue struct to hold a ReadOnlyMemory<byte> field instead of a byte[] field and then add a new constructor accepting a ReadOnlyMemory<byte> parameter. The existing constructor would of course be retained (byte[] has an implicit cast to ReadOnlyMemory<byte>).

The BinaryAttributeValue.Write method would then be updated to:

public void Write(Utf8JsonWriter writer)
{
  writer.WriteStartObject();
  writer.WriteBase64String("B", _value.Span);
  writer.WriteEndObject();
}

And the BinaryAttributeValue.ToString method would be updated to:

public override string ToString()
{
  return Convert.ToBase64String(_value.Span);
}

This would allow me to write my own custom converter for a ReadOnlyMemory<byte> property. Although, it would be even better if the library shipped one out of the box.

@firenero
Copy link
Contributor

firenero commented Apr 5, 2024

Hi @bill-poole, thanks for raising this issue. You're right, with the current API you have to copy data into a new byte array to be able to serialize it.

Your proposal looks interesting and should generally work, but I have one concern about it. There is a public property byte[] Value in BinaryAttributeValue so changing _value field type to ReadOnlyMemory<byte> requires data copy to implement this property:

public byte[] Value => _value.ToArray();

I would like to avoid doing changes like this because it can introduce a silent performance degradation for other users. Also, we can't simply delete this property or change its type to ReadOnlyMemory<byte> because it will be a breaking change. In general, I'm not against introducing a breaking change but we need to think it through to make sure the change covers as many use cases as possible. I'd like to take some time on the weekends to think through this approach and check if there are other ways of implementing this one. If you have any ideas for this issue, feel free to share them as well.

Also one question about your use case that could be outside of the scope of the library but still. If your buffer is larger than the final compressed data, how would you know what part of the ReadOnlyMemory<byte> should be serialized? Do you plan to write a slice of it to the entity property or something else? Is there anything we can do on the library level to simplify the use case?

@firenero firenero added enhancement New feature or request suggestion labels Apr 5, 2024
@firenero firenero self-assigned this Apr 5, 2024
@bill-poole
Copy link
Author

Thanks for considering this feature request @firenero. The compressor writes data to a byte array (byte[], which I rent from ArrayPool<byte>.Shared) and returns the number of bytes written. I can then get a ReadOnlyMemory<byte> from the byte array and slice it using the size returned from the compressor to create a new ReadOnlyMemory<byte> that is the subset of the larger byte array containing the compressed data. It's that slice that I then need to write to DynamoDB.

Alternatively, the compressor can write to an ArrayPoolBufferWriter<byte> (from the high performance community toolkit) instance, which has a WrittenMemory property, which gives me a ReadOnlyMemory<byte> over the compressed data.

One idea I had for how this could be approached:

  • create a new/separate dedicated attribute (e.g., BinaryAttribute<T> where there is no generic constraint on T, but where in the BinaryAttribute<T> constructor, it checks that T is either ReadOnlyMemory<byte> or byte[] (and possibly in the future, ArraySegment<byte> or any other type that can hold binary data);
  • add an AttributeValue.AsBinaryAttribute<T> method to convert to a BinaryAttribute<T> instance; and
  • add an implicit cast from BinaryAttribute<T> to AttributeValue, which checks the type of T (knowing it must be one of a specific set of supported types) and handles the cast differently for each type?

That would then allow support for more types in the future so this problem doesn't come up again. Alternatively, you could create a specific ReadOnlyMemoryBinaryAttribute specifically for ReadOnlyMemory<byte> properties.

@bill-poole
Copy link
Author

I was able to create a workaround by writing my own converter that read/writes the underlying Utf8JsonReader/Utf8JsonWriter directly.

public class ReadOnlyMemoryBinaryDdbConverter : DdbConverter<ReadOnlyMemory<byte>>
{
    public override ReadOnlyMemory<byte> Read(ref DdbReader reader)
        => reader.JsonReader.GetBytesFromBase64();
    
    public override ReadOnlyMemory<byte> Read(in AttributeValue attributeValue)
        => throw new NotImplementedException();

    public override void Write(in DdbWriter writer, ref ReadOnlyMemory<byte> value)
    {
        writer.JsonWriter.WriteStartObject();
        writer.JsonWriter.WriteBase64String(DdbTypeNames.Binary, value.Span);
        writer.JsonWriter.WriteEndObject();
    }

    public override AttributeValue Write(ref ReadOnlyMemory<byte> value)
        => throw new NotImplementedException();
}

I assume that it is valid/correct to throw an exception in the Read(in AttributeValue) and Write(ref ReadOnlyMemory<byte>) methods when the Read(ref DdbReader) and Write(in DdbWriter, ref ReadOnlyMemory<byte>) methods are overridden?

Note that this workaround would be made easier if a WriteDdbBinary(ReadOnlySpan<byte>) overload were to be added to the DdbWriter struct.

@firenero
Copy link
Contributor

firenero commented Apr 7, 2024

Yes, your converter should work, good idea! I think including this converter by default in the library is the easiest way to support the ReadOnlyMemory<byte> properties without introducing any compatibility issues. I'll think about it over a few days before adding it, though. Meanwhile, I suggest using this converter so you're not left waiting for a new release of the library.

@bill-poole
Copy link
Author

Okay great!

Given that the Read(in AttributeValue) and Write(ref ReadOnlyMemory<byte>) method implementations are not used when the Read(ref DdbReader) and Write(in DdbWriter, ref ReadOnlyMemory<byte>) methods are overridden, shouldn't the CustomIntConverter example in the documentation throw a NotImplementedException in those methods (like in the proposed ReadOnlyMemoryBinaryDdbConverter), rather than converting to/from NumberAttributeValue?

I was a bit confused by that example in the documentation having effectively two conversion implementations - one low-level conversion directly to/from JSON, and one higher-level conversion to/from NumberAttributeValue.

Furthermore, it seems the internal converters in the EfficientDynamoDb source code also do the same (e.g., BoolDdbConverter and GuidDdbConverter). Would it not be best to remove the unused higher-level conversion logic in those converters and just throw NotImplementedException?

@firenero
Copy link
Contributor

firenero commented Apr 7, 2024

Good point. I've realized that "basic" Read/Write method implementations are used in Document extensions ToObject<T>()/ToDocument<T>() calls so library converters have to provide them. Your converter will work fine as long as you don't convert objects to documents and vice versa so feel free to use it as a workaround for now.

Unfortunately, we can't ship this converter as it is in the library because it can lead to hidden errors in code. In theory, we can ship it with Read(in AttributeValue) and Write(ref ReadOnlyMemory<byte>) implementations that allocate but I'm looking for a better solution atm.

@bill-poole
Copy link
Author

Okay thank you! I really appreciate it.

@firenero
Copy link
Contributor

I found a way to write a converter that supports Document extensions ToObject<T>()/ToDocument<T>() without allocating in Write(ref T value) method. It's in PR #242 and going to be included in the next release.

@bill-poole
Copy link
Author

Thanks @firenero. I took a look at the source code and I think there might be a problem with the code below from BinaryToMemoryDdbConverter.cs. The code uses MemoryMarshal.TryGetArray to attempt to get the array backing the ReadOnlyMemory<byte>, which returns an ArraySegment (if the ReadOnlyMemory<byte> is indeed backed by an array), but then disregards the ArraySegment's Offset and Count properties. The ReadOnlyMemory<byte> could be any arbitrary slice of an array (not to mention something else like unmanaged memory).

If the ReadOnlyMemory<byte> is not backed by an array, then the implementation allocates, which is not ideal. And if it is backed by an array, and is a slice of that array, then the whole array rather than the slice is written to the database.

public override AttributeValue Write(ref Memory<byte> value)
{
    var array = MemoryMarshal.TryGetArray((ReadOnlyMemory<byte>)value, out var segment)
        ? segment.Array
        : value.ToArray();
    Debug.Assert(array != null);

    return new AttributeValue(new BinaryAttributeValue(array));
}

@firenero
Copy link
Contributor

then disregards the ArraySegment's Offset and Count properties.
if it is backed by an array, and is a slice of that array, then the whole array rather than the slice is written to the database.

You're right about this, I need to narrow down the happy path only to cases when Offset == 0 and Count == value.Length.

the implementation allocates, which is not ideal

Agree, it's not ideal but on practice the path with allocation will be very rare. In most cases, Memory<byte> is backed by an array and there is only one case (or two, but they are similar) when the allocation can possibly happen. The conditions are:

  1. Entity is created manually in code (not read from the DDB)
  2. The memory field is either not backed by an array OR it's a slice of an array. Note that if the memory field is backed by a not-sliced array, this condition is not true.
  3. context.ToDocument(item) is used to convert the entity to document.

In all other cases, it won't allocate because either this converter method is not called or the happy path without allocation is possible.

Example of the code that will allocate:

public class TestObj
{
    [DynamoDbProperty("mem")]
    public ReadOnlyMemory<byte> Memory { get; set; }
}

var arr = new byte[] { 0x69, 0xa7, 0x5d, 0xff};
var item = new TestObj
{
    Memory = arr.AsMemory(1, 2) // Take only 0xa7 and 0x5d bytes
};

// This conversion allocates new array for 0xa7 and 0x5d bytes
var doc = context.ToDocument(item); 

Similarly, it will allocate if your memory represents an unmanaged data but that's also happens quite rarely. In all other cases it will NOT allocate so I think it's a decent tradeoff overall. Let me know what you think about it.

@bill-poole
Copy link
Author

bill-poole commented May 13, 2024

I suspect that if someone is using Memory<byte>/ReadOnlyMemory<byte> instead of byte[], they are likely relying on the slicing ability provided by Memory<byte>/ReadOnlyMemory<byte>, meaning that the converter will allocate, otherwise I assume they could just use byte[].

That is actually my scenario. i.e., I'm using ReadOnlyMemory<byte> instead of byte[] because I need to pre-allocate an array to write compressed data into, and I don't know the size of the compressed data until the compression completes. At that time, I can either copy the compressed data into a newly allocated exact-sized byte[], or I can use a slice of the ReadOnlyMemory<byte>, but then the converter you propose would copy that to a new exact-sized byte[] anyway, albeit only when invoking ToDocument.

I assume the concern/issue with updating BinaryAttributeValue to use a ReadOnlyMemory<T> value instead of a byte[] value is maintaining backwards compatibility of the public API? If so, have you considered updating BinaryAttributeValue as follows:

[StructLayout(LayoutKind.Explicit)]
public readonly struct BinaryAttributeValue
{
    [FieldOffset(0)] 
    private readonly ReadOnlyMemory<T> _value;

    public byte[] Value => MemoryMarshal.TryGetArray(_value, out var segment)
        && segment.Count == segment.Array.Length
        ? segment.Array
        : _value.ToArray();

    public Memory<byte> AsMemory()
        => MemoryMarshal.AsMemory(_value);

    public BinaryAttributeValue(byte[] value)
    {
        _value = value;
    }

    public BinaryAttributeValue(ReadOnlyMemory<byte> value)
    {
        _value = value;
    }

    public void Write(Utf8JsonWriter writer)
    {
        writer.WriteStartObject();
        writer.WriteBase64String(DdbTypeNames.Binary, _value.Span);
        writer.WriteEndObject();
    }

    public override string ToString() => Convert.ToBase64String(_value.Span);
}

As far as I can tell, this allows BinaryAttributeValue to hold/use ReadOnlyMemory<byte> values while maintaining backwards compatibility.

Furthermore, it would only allocate if someone creates a BinaryAttributeValue using a ReadOnlyMemory<byte> slice, but then reads the value using the Value property, which probably shouldn't ever happen? I'd think that if someone builds a converter that relies on Memory<byte> or ReadOnlyMemory<byte>, they wouldn't access/read the Value property, only the AsMemory() method.

You would then have a MemoryBinaryDdbConverter class as follows:

public class MemoryBinaryDdbConverter : DdbConverter<Memory<byte>>
{
    public override Memory<byte> Read(ref DdbReader reader)
        => reader.JsonReader.GetBytesFromBase64();
    
    public override Memory<byte> Read(in AttributeValue attributeValue)
        => attributeValue.AsBinaryAttribute().AsMemory();

    public override void Write(in DdbWriter writer, ref Memory<byte> value)
    {
        writer.JsonWriter.WriteStartObject();
        writer.JsonWriter.WriteBase64String(DdbTypeNames.Binary, value.Span);
        writer.JsonWriter.WriteEndObject();
    }

    public override AttributeValue Write(ref Memory<byte> value)
        => new AttributeValue(new BinaryAttributeValue(value));
}

And you would have a ReadOnlyMemoryBinaryDdbConverter class as follows:

public class ReadOnlyMemoryBinaryDdbConverter : DdbConverter<ReadOnlyMemory<byte>>
{
    public override ReadOnlyMemory<byte> Read(ref DdbReader reader)
        => reader.JsonReader.GetBytesFromBase64();
    
    public override ReadOnlyMemory<byte> Read(in AttributeValue attributeValue)
        => attributeValue.AsBinaryAttribute().AsMemory();

    public override void Write(in DdbWriter writer, ref ReadOnlyMemory<byte> value)
    {
        writer.JsonWriter.WriteStartObject();
        writer.JsonWriter.WriteBase64String(DdbTypeNames.Binary, value.Span);
        writer.JsonWriter.WriteEndObject();
    }

    public override AttributeValue Write(ref ReadOnlyMemory<byte> value)
        => new AttributeValue(new BinaryAttributeValue(value));
}

@firenero
Copy link
Contributor

they are likely relying on the slicing ability provided by Memory/ReadOnlyMemory, meaning that the converter will allocate
or I can use a slice of the ReadOnlyMemory, but then the converter you propose would copy that to a new exact-sized byte[] anyway, albeit only when invoking ToDocument.

I think there is a misunderstanding. My version of converter will only allocate if you create an entity in code and then convert it to Document. This example from my previous comment will allocate:

public class TestObj
{
    [DynamoDbProperty("mem")]
    public ReadOnlyMemory<byte> Memory { get; set; }
}

var arr = new byte[] { 0x69, 0xa7, 0x5d, 0xff};
var item = new TestObj
{
    Memory = arr.AsMemory(1, 2) // Take only 0xa7 and 0x5d bytes
};

// This conversion allocates new array for 0xa7 and 0x5d bytes
var doc = context.ToDocument(item); 

I can't imagine a scenario when someone need to create an entity then convert it to Document. I believe in most cases someone would want to convert an entity to Document after reading from the DB, that will NOT allocate:

var item = await context.GetItemAsync("pk", "sk");
var doc = context.ToDocument(item); // this won't allocate

Correct me if I'm wrong, but your use case is:

  1. Create an entity with ReadOnlyMemory<byte> property that contains a slice of some byte[] array.
  2. Write this entity to the DB.
  3. At some point read it back from DB.

There is no allocation in the case of writing to DB because the optimized version Write(in DdbWriter writer, ref ReadOnlyMemory<byte> value) is called. The same applies to reading from the DB.


Your idea with using MemoryMarshal.TryGetArray in BinaryAttributeValue looks really interesting - I haven't thought about it before. I will test it in detail later this week.

Furthermore, it would only allocate if someone creates a BinaryAttributeValue using a ReadOnlyMemory slice, but then reads the value using the Value property, which probably shouldn't ever happen?

I have a feeling that it's a similar assumption to my previous one "No one creates an entity and converts it to a document right away" but I could be mistaken. No one is supposed to do it this way but we never know all the people use cases.

Anyways, I will check again your proposed API. Meanwhile, please let me know if we're on the same page about allocations in my current converter approach and if I understand your use case correctly.

@bill-poole
Copy link
Author

I can't imagine a scenario when someone need to create an entity then convert it to Document.

Yes I agree; I couldn't think of a scenario either. However, the DynamoDbContext class has a ToDocument method, which seems to exists for that purpose, so I assumed that there was a use case I was missing. If the scenario were never to arise, then the converter could just throw a NotImplementedException in the Write(ref T) method.

please let me know if we're on the same page about allocations in my current converter approach and if I understand your use case correctly.

Yes I think we are on the same page and I think you understand my use case correctly. My scenario currently doesn't invoke the AttributeValue Write(ref ReadOnlyMemory<byte> value) of the ReadOnlyMemory<byte> converter, which is why I was able to get away with just throwing NotImplementedException. So the converters you proposed wouldn't allocate in my specific scenario.

However, tomorrow's scenarios are hard to predict, so I was trying to come up with a solution that would always allow allocations to be avoided under all circumstances.

@firenero
Copy link
Contributor

@bill-poole I've checked the BinaryAttributeValue implementation you suggested in #228 (comment). I don't think we can make this constructor (at least publicly available):

public BinaryAttributeValue(ReadOnlyMemory<byte> value)
{
    _value = value;
}

The data itself is converted to mutable Memory so the read-only contract can be broken. I think for now I'll go with the fixed converters implementation #244 and evaluate doing the proper breaking change to BinaryAttributeValue in the major release. It works well for the majority of use cases and it has no downsides to current workflows since the Memory/ReadOnlyMemory were not supported at all.

@firenero
Copy link
Contributor

firenero commented Jun 8, 2024

This is now available in v0.9.16. Thanks for helping me with with it. Feel free to re-open this ticket or open a new one if you need anything.

@firenero firenero closed this as completed Jun 8, 2024
@bill-poole
Copy link
Author

Okay, thanks @firenero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request suggestion
Projects
None yet
Development

No branches or pull requests

2 participants