Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serde: add default deserializers implementations #233

Merged
merged 4 commits into from
Dec 6, 2023

Conversation

hackaugusto
Copy link
Contributor

@hackaugusto hackaugusto commented Dec 1, 2023

The Deserializer trait does not have an associated type, all methods return Self. That means it is not possible to have a blanket implementation for slices &[T], because the signature is required to be fn read_from<R: ByteReader>(source: &mut R) -> Result<&[T], DeserializationError>;. I can't add a associated type without making a breaking change because default associated types are unstable https://github.com/rust-lang/rfcs/blob/master/text/2532-associated-type-defaults.md. I also can't define a blanket implementation for Vec<T> because the default serializer does not include the length of the vector.

edit: also note:

  • the serializer doesn't return errors, if one decides to encode an usize into u8 there is nothing to do besides panicking, this could be an attack vector.
  • the write_batch_into is the same as the vec implementation, should probably be removed since it doesn't have a clear receiver and there is equivalent functionality

@@ -113,17 +197,16 @@ impl<T: Serializable> Serializable for &Vec<T> {
}
}

impl<T: Serializable, const N: usize> Serializable for Vec<[T; N]> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a blanket implementation for impl<T, C> Serializable for [T; C]. Together with the existing blanked implementation for Vec<T>, &Vec<T>, and &[T], it made these 3 implementations obsolete.

Copy link
Collaborator

@irakliyk irakliyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good! I left a few small comments inline.

utils/core/src/serde/mod.rs Show resolved Hide resolved
Comment on lines 200 to 211
impl Serializable for String {
fn write_into<W: ByteWriter>(&self, target: &mut W) {
let source = flatten_slice_elements(self);
T::write_batch_into(source, target);
target.write_u64(self.len() as u64);
target.write_bytes(self.as_bytes());
}
}

impl<T: Serializable, const N: usize> Serializable for &Vec<[T; N]> {
impl Serializable for &String {
fn write_into<W: ByteWriter>(&self, target: &mut W) {
let source = flatten_slice_elements(self);
T::write_batch_into(source, target);
(*self).write_into(target)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deviates somewhat from how we serialize lists in that we try not to make assumptions data types used to serialize the number of elements. I think we can leave it up to the callers to figure out how exactly they want to serialize a given string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the problem. You don't want these implementations? You don't want to include the string length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the issue is with the use of as_bytes? You don't want to encode the data as utf8?

The biggest issue here is that the user can not implement this trait, because of the orphan rules. So here we either have to make the choice or not give an implementation to the user.

When the user is writing their own Serializable, they can also deviate from this by just not calling it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to remove these implementations as serializing strings without length is probably not a good idea, and serializing them with length is not consistent with the overall approach (e.g., different from how we serialize slices, vectors etc.).

I don't think this affects the usability too much as the user can serialize strings manually pretty easily - e.g.,:

let x = "hello world".to_string();

target.write_u64(x.len() as u64);
target.write_bytes(x.as_bytes());

On the deserialization front, I think we can improve the ergonomics by adding read_string() method to ByteReader interface. It could look something like:

fn read_string(&mut self, len: usize) -> Result<String, DeserializationError> {
    let data = source.read_vec(len)?;
    String::from_utf8(data).map_err(|err| DeserializationError::InvalidValue(format!("{err}")))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to remove these implementations as serializing strings without length is probably not a good idea, and serializing them with length is not consistent with the overall approach (e.g., different from how we serialize slices, vectors etc.).

The question is, does it make sense to serialize slices and vectors without length too? And if this is done for consistency sake, why can't the String be serialized without the length too?

I don't think this affects the usability too much as the user can serialize strings manually pretty easily - e.g.,:

Your call

On the deserialization front, I think we can improve the ergonomics by adding read_string() method to ByteReader interface. It could look something like:

This can be done in another PR.

Removed the default implementation

utils/core/src/serde/mod.rs Show resolved Hide resolved
Comment on lines 414 to 424
impl Deserializable for String {
fn read_from<R: ByteReader>(source: &mut R) -> Result<Self, DeserializationError> {
let length = source
.read_u64()?
.try_into()
.map_err(|err| DeserializationError::InvalidValue(format!("{err}",)))?;
let data = source.read_vec(length)?;

String::from_utf8(data).map_err(|err| DeserializationError::InvalidValue(format!("{err}")))
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to one of the previous comments re strings: I'd rather not make opinionated choices about which data type to use for serializing string lengths.

Comment on lines +428 to +411
let data: Vec<T> = T::read_batch_from(source, C)?;

// SAFETY: the call above only returns a Vec if there are `C` elements, this conversion
// always succeeds
let res = data.try_into().unwrap_or_else(|v: Vec<T>| {
panic!("Expected a Vec of length {} but it was {}", C, v.len())
});

Ok(res)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit unfortunate that we have to allocate a vector here to deserialize the data - but I wonder if the compiler is smart enough to figure out that this is not needed.

Copy link
Collaborator

@irakliyk irakliyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you!

@irakliyk irakliyk merged commit 271c19d into facebook:main Dec 6, 2023
9 checks passed
@hackaugusto hackaugusto deleted the hacka-default-deserializers branch December 7, 2023 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants