Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between serialize and deserialize in Serde #138

Open
chmp opened this issue Oct 1, 2024 · 9 comments · May be fixed by #149
Open

Inconsistency between serialize and deserialize in Serde #138

chmp opened this issue Oct 1, 2024 · 9 comments · May be fixed by #149
Labels
question Further information is requested

Comments

@chmp
Copy link

chmp commented Oct 1, 2024

jiff uses inconsistent Serde types in serialization / deserialization. jiff types

  • generally serialize to string (serializer.collect_str()) (example)
  • but request bytes in deserialization (deserializer.deserialize_bytes(visitor)) (example)

It's not a huge issue, but I wanted to give a heads up: this inconsistency tripped me up in a couple of places, while adding support for jiff to my crate. I am now making my own impl more robust, by adding a deserialize_bytes impl to every deserialize_str impl :)

@BurntSushi
Copy link
Owner

Sorry, I'm not understanding the problem. Does this lead to an actual issue in your code? Like it makes sense that these would be different at a conceptual level. Serialization is guaranteed to produce valid UTF-8, so it serializes a string. But deserialization can be done on a &[u8], which may not be valid UTF-8.

@chmp
Copy link
Author

chmp commented Oct 1, 2024

Not really an issue: I simply need to add deserialize_bytes impls that hand the bytes of the string to to the visitor.

I was merely surprised: so far I only encountered types that use deserialize_str in their Deserialize impl, when using collect_str in their Serialize impl. I always interpreted the deserialize_* methods used to indicate the "natural" type of the value and I expected that to be the type serialized to (string in this case).

@BurntSushi
Copy link
Owner

Do you agree that it makes sense given that parsing works on &[u8] and therefore, deserialization accepting either &[u8] or &str is maximally flexible? Like, if the Deserialize impl didn't accept &[u8], then you couldn't deserialize data types from bytes. It would have to be string.

Now if you're layering stuff on top of this where only strings make sense, then it should be fine if you just use deserialize_str, right? It won't be maximally flexible, but maybe it doesn't need to be. It depends on your use case.

(It's possible I have a misunderstanding here.)

@chmp
Copy link
Author

chmp commented Oct 1, 2024

Oh sorry. I think I did not explain myself clearly. I completely agree that accepting both &[u8] and &str is a good idea.

The only thing that would change would be:

impl<'de> serde::Deserialize<'de> for Zoned {
    fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Zoned, D::Error> {
        use serde::de;

        struct ZonedVisitor;
        // ...

-        deserializer.deserialize_bytes(ZonedVisitor)
+        deserializer.deserialize_str(ZonedVisitor)
    }
}

E.g., in serde_arrow I use the deserializer.deserialize_* call to detect the "natural" / "preferred" type of a value. serde_reflect is doing the same. The deserializer is still free to hand your visitor a &[u8].

As I said most types I encountered were symmetric in the Serde type they serialized to and the type they requested from the deserializer. Not sure whether this is a general good practice in the implementation of the Serde traits. Hence, I wanted to give you a quick heads up.

@BurntSushi
Copy link
Owner

You're saying that if I used deserialize_str instead of deserialize_bytes, then it could still accept bytes? Maybe I have a misunderstanding of Serde itself. What is the actual behavioral difference between deserialize_str and deserialize_bytes?

@chmp
Copy link
Author

chmp commented Oct 1, 2024

Puh. I have to admit the Serde API with its asymmetry between Serialize and Deserialize is still somewhat confusing. Re. this issue:

The deserialize_* methods are hints to the deserializer for how to interpret the next piece of input
[...]
Note that a Deserializer will not necessarily follow the type hint, so the call to deserialize_i32 does not necessarily mean the Deserializer will call I32Visitor::visit_i32

(source).

So in this case, the fact that you are calling deserialize_bytes does not all guarantee that you get &[u8], in principle the deserializer could also hand you back a i64.

I always interpreted the call to deserialize_* to indicate the "preferred" type.

@BurntSushi
Copy link
Owner

Interesting. I'll leave this open for now since it seems deserving of a deeper dive on my part. The visitors in this crate do also accept both strings and bytes.

I agree that a string is, I guess, the "preferred" type in this context.

@BurntSushi BurntSushi added the question Further information is requested label Oct 1, 2024
@chmp
Copy link
Author

chmp commented Oct 1, 2024

I guess, this change would also be a visible, hence breaking, one. So not sure it's worth the effort.

@BurntSushi
Copy link
Owner

Well I guess that gets to my question: what is the actual behavior difference?

In either case, my plan with Jiff is to do 1-2 breaking changes leading up to Summer 2025, and then publish Jiff 1.0. So if it's really a breaking change, that's fine, it can be included in jiff 0.2.

LeoniePhiline added a commit to LeoniePhiline/jiff that referenced this issue Oct 31, 2024
Deserializing `jiff` types from YAML is not
currently possible.

Neither `serde_yaml` nor its maintained fork
`serde_yml` support deserializing from bytes.

These tests serve in both

- proving the current incompatibility
- verifying the fix in the following commit,
  which migrates `Deserialize` implementations
  to `deserializer.deserialize_str`.

Related to BurntSushi#148, BurntSushi#138
LeoniePhiline added a commit to LeoniePhiline/jiff that referenced this issue Oct 31, 2024
Deserializing `jiff` types from YAML was
previously not possible:

Neither `serde_yaml` nor its maintained fork
`serde_yml` support deserializing from bytes.

This changset migrates `Deserialize`
implementations to `deserializer.deserialize_str`,
thus providing serde YAML compatibility.

Fixes BurntSushi#148, BurntSushi#138
LeoniePhiline added a commit to LeoniePhiline/jiff that referenced this issue Oct 31, 2024
Deserializing `jiff` types from YAML is not
currently possible.

Neither `serde_yaml` nor its maintained fork
`serde_yml` support deserializing from bytes.

These tests serve in both

- proving the current incompatibility
- verifying the fix in the following commit,
  which migrates `Deserialize` implementations
  to `deserializer.deserialize_str`.

Related to BurntSushi#148, BurntSushi#138
LeoniePhiline added a commit to LeoniePhiline/jiff that referenced this issue Oct 31, 2024
Deserializing `jiff` types from YAML was
previously not possible:

Neither `serde_yaml` nor its maintained fork
`serde_yml` support deserializing from bytes.

This changset migrates `Deserialize`
implementations to `deserializer.deserialize_str`,
thus providing serde YAML compatibility.

Fixes BurntSushi#148, BurntSushi#138
LeoniePhiline added a commit to LeoniePhiline/jiff that referenced this issue Oct 31, 2024
Deserializing `jiff` types from YAML is not
currently possible.

Neither `serde_yaml` nor its maintained fork
`serde_yml` support deserializing from bytes.

These tests serve in both

- proving the current incompatibility
- verifying the fix in the following commit,
  which migrates `Deserialize` implementations
  to `deserializer.deserialize_str`.

Related to BurntSushi#148, BurntSushi#138
LeoniePhiline added a commit to LeoniePhiline/jiff that referenced this issue Oct 31, 2024
Deserializing `jiff` types from YAML was
previously not possible:

Neither `serde_yaml` nor its maintained fork
`serde_yml` support deserializing from bytes.

This changset migrates `Deserialize`
implementations to `deserializer.deserialize_str`,
thus providing serde YAML compatibility.

Fixes BurntSushi#148, BurntSushi#138
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants