Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct parsing of custom types #1044

Open
sdebionne opened this issue Sep 19, 2024 · 11 comments
Open

Direct parsing of custom types #1044

sdebionne opened this issue Sep 19, 2024 · 11 comments

Comments

@sdebionne
Copy link

About direct parsing, the documentation states that:

The drawback of this approach is that fully custom type representations are not supported, only the library-provided conversions are.

I wonder where this limitation stands?

I have a described struct with an std::chrono::duration member. While I have the proper tag_invoke overloads, since there is no built-in conversion support for std::chrono, I can't use parse_into. Is that correct?

Would it be possible to add new customization points to allow direct parsing of custom types?

Any chance to add a conversion category for date /time / duration (if that make sense)?

@grisumbras
Copy link
Member

Direct parsing is implemented using a special event handler that is a tree of dedicated subhandlers for components of the target value, and works by descending to the first subhandlers that accepts a parsing event. The implementation relies on cooperation of parent and child subhandlers.

In order to allow users to add support for their types we would have to make this system a public API and document how all that works. I'm not sure it's a very good idea, because writing those subhandlers is very tricky.

I have an alternative idea, which I'm still not sure I could pull off, but we'll see. The idea is to have representation "proxies". You define a trait that your type is supposed to be represented as some other type, and converter/parser will create an object of that type and will do conversion/parsing into that object instead.

This would provide a significant degree of customisation opportunities, like representing a chrono::duration as a number of ticks, or a chrono::time_point as an ISO 8601 string.

@pdimov
Copy link
Member

pdimov commented Sep 21, 2024

Support for std::chrono::duration should be built-in. time_point is maybe trickier. Or maybe not.

@grisumbras
Copy link
Member

What should be the representation of duration in your opinion?

@sdebionne
Copy link
Author

I remembered you mentioned the idea of having "proxies" in your latest blog post but to be honest I did not get it. What would be the proxy for std::chrono::duration, for instance?

I support the idea of idea of having std::chrono::duration built-in. The representation could be either the number of ticks (int) and the period (int) to keep the full precision. Or the floating point representation in second.

@grisumbras
Copy link
Member

Representing them as a number of ticks has the drawback of all durations being represented in the same way (no way to tell 10 seconds and 10 minutes apart).

Representing them as floating point value of seconds opens up the possibility of rounding errors.

So, both approaches are not ideal.

@pdimov
Copy link
Member

pdimov commented Sep 23, 2024

We have the following options (for e.g. a duration of 42 minutes):

  1. 42
  2. "42m"
  3. [42, 60, 1]
  4. [42, "m"]
  5. {"count": 42, "period": [60, 1]}
  6. {"count": 42, "period": "m"}

(1) has the disadvantage as stated above. Even though it works (for a struct member, for instance) because the period is encoded in the source code as the duration type, it would cause issues if two or more durations are put into a variant.

(2) is unnecessarily hard to parse and kind of defeats the point of using a structured format such as JSON.

(5) and (6) are the most human-readable, but overly verbose. For a file with many durations in it, they'll cause a lot of unnecessary overhead.

(4) is more human-readable than (3), but harder to parse.

This leaves me with (3). Interestingly though, we can in principle make several of these work when parsing, because they can be disambiguated (but we do need to choose one when serializing.)

So we can in theory add support for one of these, e.g. (3), today, and then later add support for e.g. (4) if we decide we like that better.

Edit: there's also the option of [42, [60, 1]], which is more regular, but there's no significant difference with (3) based on the criteria above, so that would be something like (3a) in our classification.

@pdimov
Copy link
Member

pdimov commented Sep 23, 2024

On second thought, I like [42, [60, 1]] better than [42, 60, 1] because it has more structure and doesn't match the representation of e.g. std::vector<int> or something like math::point3<float>.

@sdebionne
Copy link
Author

I would think that, in most applications, both side of the serialization agrees on the period, so the ticks representation would work.

Is it not the responsability of the schema to provide the unit/period?

@grisumbras
Copy link
Member

As Peter mentioned, every ambiguous representation becomes a problem for variants. I.e. you have variant<seconds, hours>, you save hours(10). You load seconds(10). How common such cases would be is hard to say.

I would think that, in most applications, both side of the serialization agrees on the period, so the ticks representation would work.

I would think, for JSON you usually don't control the format, which is why I prefer representations that don't just technically work, but are actually likely to be used by people. The spec you linked recommends ISO 8601 strings. It's actually not that hard to implement, so maybe we should use that.

@grisumbras
Copy link
Member

Although, maybe not. The linked ABNF doesn't allow fractions of seconds.

@sdebionne
Copy link
Author

sdebionne commented Sep 23, 2024

Indeed ISO 8601 is limited in precision to ms.

Maybe it is not the responsibility of the JSON value itself to carry the unit/period. Instead this info could come from a schema. Something like:

uptime:
  type: integer
  unit: microseconds
  example: 50

as discussed in json-schema-org/json-schema-vocabularies#46.

Taking Boost.Units quantities for example, we would probably represent it as numeric (or whatever the underlying value type matches best) and leave the unit to the schema?

Side note: it is possible (tested to some extend) to generate a schema from a described struct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants