Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serde support: TimeZone or Offset #89

Open
onkoe opened this issue Aug 10, 2024 · 26 comments
Open

serde support: TimeZone or Offset #89

onkoe opened this issue Aug 10, 2024 · 26 comments
Labels
question Further information is requested

Comments

@onkoe
Copy link

onkoe commented Aug 10, 2024

Hey there! Thanks for this incredible new crate!

I'm using it in a new project, but I'm having difficulties due to the TimeZone and Offset types lacking serde implementations. I can understand why a TimeZone might not have it (the design document was very convincing).

However, is it intentional that Offset doesn't have one? This seems like it'd just serialize that internal integer. (The define_ranged! machine does look mildly complicated here, of course).

Please let me know if I'm missing anything, or there's something else that might be better for serializing a user's timezone.

Once again, thanks for the new contribution to the ecosystem! :)

@BurntSushi
Copy link
Owner

Can you say more about why you're serializing these types separate from the datetime types?

@onkoe
Copy link
Author

onkoe commented Aug 10, 2024

I'm building a Discord bot for friends and want to keep things simple by tying them with their time zones.

Since I'm already using jiff for its more commonplace datetime-focused types, it works out nicely since I don't need to write a bunch of conversions from some other timezone type!

Please let me know if that helps enough. I'm happy to give code examples! :)

@BurntSushi
Copy link
Owner

Right, so you should be able to use Zoned for that. That is, it will include the time zone and offset when serializing it. Most datetime libraries don't do this, so you might be used to serializing the time zone separately.

But reading closer, it looks like you want to associate the time zone itself with the user outside the context of a datetime. Which I think makes sense. Can you say though why you want to serialize an offset instead of just a time zone?

In terms of technical implementation, there isn't any problem with adding a serde impl for Offset. Whether to add it or not is really about API design and being careful about avoiding encouraging folks to do the wrong thing. So for the Offset specifically, I would need to think about it.

But for TimeZone, that's harder, because there just simply may not be a compact serializable representation of it.

Instead, I would ask you to consider storing a IANA time zone identifier for each of your friends. It might be missing if you don't know it. Then you would just call TimeZone::get on the stored string when you need the actual time zone. Those calls are cached. And you can get the identifier from a time zone with TimeZone::iana_name if one exists.

Does that help?

@onkoe
Copy link
Author

onkoe commented Aug 10, 2024

Great, thank you!

My idea for using an Offset over TimeZone is that a TimeZone has attributes that change, but the bot is built around one central pillar: it must store the "availability" of people, then compare the availabilities of all users when a user sends a command to the bot.

Currently, I do this by storing a Vec<Span> alongside the user, but that list can sometimes be empty when the user is always available. Here's how that looks:

/// A representation of someone, with some of their details important to
/// knowing when to ping.
#[non_exhaustive]
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct Person {
    user_id: UserId,
    timezone: Offset,
    /// A list of timespans where they're unavailable (gone).
    gone: Vec<Unavailability>,
}

impl Person {
    /// Given a `UserId`, creates a new person with "default" availability
    /// and timezone (i.e. both blank).
    pub fn new(user_id: UserId) -> Self {
        Self {
            user_id,
            timezone: Offset::UTC,
            gone: Vec::new(),
        }
    }
}

/// A span of time where someone isn't able to play.
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub struct Unavailability {
    span: Span,
}

For this code to work, I need the user's timezone to be stored independently of their availability. Otherwise, I can't convert their spans if they change their timezone. Using Offset here isn't too important, but I know it's easier to serialize a simple offset than encoding the complexities of a timezone.

It's also less fallible, as I don't need to 'convert' the list of Spans to the timezone each time I try to use them. (If I didn't do that with a TimeZone, the spans could be wrong from moment-to-moment.)

Timezone Identifiers

I wouldn't mind storing a timezone identifier, so long as I can easily find them. Can I get it from a given Offset? A Zoned? Or even a TimeZone itself? If not, I'll likely have to make my own type, which is something I really want to avoid! 😄

Would it make sense to store a string? Would it be safe? Thank you!

@BurntSushi
Copy link
Owner

BurntSushi commented Aug 10, 2024

My take is that you should 100% be storing an IANA time zone identifier. And yes, it should be stored as a string. And no, you cannot get an IANA time zone identifier from an offset because there may be many IANA time zone identifiers for the same offset. Instead, it's the reverse: you get an offset from a time zone and either a Timestamp (instant in time) or a civil::DateTime (an end user's local time). How you get an IANA time zone identifier from an end user is I guess your job. How do you get their offset? You might need to ask them.

Using an IANA time zone identifier actually seems critical to your use case here, to the point that storing an offset would be completely wrong. For example, for where I live, my IANA time zone identifier is always America/New_York. The only way that would change is if I physically moved. But my offset changes twice per year: it goes from -05 to -04 in the late winter (this year it was March 10, 2024) and -04 to -05 in mid-autumn (this year it will be November 3, 2024). This is true in many locations in the world and is known as daylight saving time. If you only capture the offset for each user at the present time, all of your calculations will be wrong once they enter or leave daylight saving time. But if you use an IANA time zone identifier and the Zoned data type, then you literally don't need to worry about this at all. Jiff will handle everything and should actually make it impossible for you to get it wrong.

@onkoe
Copy link
Author

onkoe commented Aug 10, 2024

Storing an offset is definitely wrong to some degree! To address the issues you mentioned, I'll probably make a serde-impl'd type that holds a String and interfaces with jiff to check the Spans when they're needed.

Nonetheless, I'd love to see some form of serialization. So, assuming you'd want to serialize in IANA form, that leaves just two questions:

  • Can jiff currently get an IANA identifier from each 'variant' of TimeZone? (Or, in other words, should jiff guess a IANA ident for the POSIX/offset variants, and alternatively, do all variants have a consistent String representation?)
  • If not, can TimeZone expose its inner TimeZoneTzif when present?

Answering these questions would solve this issue. However, for my simple project, using String timezone identifiers and Timestamp in the Span list would be an adequate solution. Thank you for your assistance!

@BurntSushi
Copy link
Owner

Can jiff currently get an IANA identifier from each 'variant' of TimeZone?

No, it can't. That's why TimeZone::iana_name returns an option. It is fundamentally impossible to always have an IANA identifier. But that is the happy path. A missing TZ identifier is a generally a sub-optimal situation.

As for exposing the inner TZif, it is in theory possible, but it's probably a bad idea. I'm more likely to go the route outlined in #30. But making that "easy" via a serde impl is almost certainly the wrong thing to do.

@onkoe
Copy link
Author

onkoe commented Aug 10, 2024

I tend to agree on the serde part - deciding for users seems potentially harmful.

The plan in #30 sounds pretty good! Please let me know if you see any movement there. I'll watch the issue in the meantime.

@BurntSushi
Copy link
Owner

Going to close this out in favor of #30. (Although, from my perspective, it does seem like just an IANA tz identifier is sufficient here for the OP's use case.)

@BurntSushi BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Aug 28, 2024
@maxcountryman
Copy link

@BurntSushi I'd like to be able to store TimeZone in a database (e.g. as a configuration option belonging to a user, etc). I understand your point re IANA names not being applicable to e.g. offsets and generally speaking, I'd expect most use cases to favor only supporting IANA names.

That said, from an application development perspective, there is no public type that says "this can only be IANA". So when going to implement a library that uses jiff, I'm not sure how to approach this. I can ask users to provide a TimeZone but only at runtime will they discover that non-IANA time zones fail when we e.g. call iana_name and it returns None--this isn't ideal.

If there's a better way of doing this already built into jiff, that's great. But so far, I haven't figured out a satisfactory solution.

@BurntSushi
Copy link
Owner

I think you probably want to accept a &str and then do the lookup yourself to get a TimeZone.

Also, if you're storing zone aware datetimes in a DB, you can use the Display and FromStr impls on Zoned. They will do the right thing, including attaching the IANA zone name if it exists.

If this doesn't help, can you suggest an API change in Jiff that does what you want? If only as a way to crystalize what it is you're looking for and prevent misunderstanding.

@maxcountryman
Copy link

I think you probably want to accept a &str and then do the lookup yourself to get a TimeZone.

I still wouldn't have a way to store the TimeZone itself which means I need to store the string and do lookups whenever I want a TimeZone. Internally, I want to use TimeZone wherever possible since jiff is ubiquitous.

I should also point out, if a user wants to use a non-IANA TimeZone that's also okay: I don't have a strong reason to limit that, it's just that if the only reliable string representation is IANA names then I'm also okay with that restriction provided I don't need to give up on strong typing altogether.

Because users of this library will already be using jiff (it's an intentional design decision to expose jiff types in its API), the more natural thing to do is provide strong typing (both as a library author and as a user of the library).

If this doesn't help, can you suggest an API change in Jiff that does what you want?

The elephant in the room is serde of course. 🙂 I see you've mentioned that compact representation is challenging, yet it seems we have a solution for Zoned so I'm not completely clear on why it's so difficult. What I'd want is just the isolation of TimeZone from that triple in a format that I can store (so binary, etc is also fine).

@BurntSushi
Copy link
Owner

Can you say more about your higher level use case? i.e., more detail than just that you want to store it in a database.

@maxcountryman
Copy link

I'm not sure what additional detail you're looking for. Can you be more specific?

@BurntSushi
Copy link
Owner

"storing in a database" is a means to solve a problem and not an end itself. So, what problem are trying to solve here?

When dealing with datetimes, and especially in contexts that motivate the API design of a datetime library, it is really important to understand the higher level use case.

@maxcountryman
Copy link

So, what problem are trying to solve here?

The design of a library that uses jiff in its public interface and also needs to persist e.g. configuration that contains said types.

Is it possible to convert from a string to a TimeZone back to a string and then do that in reverse? It is, but it's not nice and the public API becomes stringly as a result. This is doubly gross when we consider jiff isn't an implementation detail, it's part of the API end users will use; they'll be inclined toward jiff's types. So it's unnatural and error-prone to use &str where we mean TimeZone anyway. Likewise, manually converting a string from a storage medium is troublesome in the same way.

Taking a step back, while I agree that often IANA-names are "what you really want" I don't agree that it's correct to force that decision and the argument doesn't land as well when there's really no type for an IANA-only TimeZone; at least if there were I wouldn't have to resort to strings. Indeed, there are plenty of scenarios where a fixed offset which doesn't interact with DST is just as correct as an IANA name that will yield DST aware behavior. So to my mind, it's not right to force this decision on users unless I know that it's never correct to use an offset.

@BurntSushi
Copy link
Owner

I appreciate the response. I do think it's going to be difficult to make progress here without concrete use cases. I recognize you probably feel like you've given enough detail, but it is very difficult for me to think about this in the context of the details you've given so far.

I'll re-open this issue for now.

Let me try this again: can you suggest an API addition or change to Jiff that would accommodate your use case? I asked this before but I don't think I really got a precise answer to this.

@BurntSushi BurntSushi reopened this Oct 3, 2024
@BurntSushi BurntSushi added the question Further information is requested label Oct 3, 2024
@maxcountryman
Copy link

can you suggest an API addition or change to Jiff that would accommodate your use case?

I can think of a few possibilities:

  1. Provide implementations of Serialize and DerserializeOwned for TimeZone (I know I said this already, but it feels the most natural given that serde is already a feature of the crate; it's actually the most surprising thing thus far to see that virtually none of the important types actually support serde)
  2. Provide a complementary type that's only allowed to be derived from valid IANA names (so we can avoid using strings)
  3. Provide a representation of TimeZone that allows users to de/serialize on their own (for instance, as_bytes and from_bytes methods)

I'm happy to talk about more concrete use cases, but I do want to avoid reducing the conversation to "just use strings": that isn't the design path I'm hoping to pursue. It's also not technically correct for me to impose IANA names, for e.g. running schedules that don't care about DST awareness and instead are just as validly defined with an offset. So the most correct answer is the user should decide by providing a TimeZone of their choosing.

That said, I would accept TimeZone that are derived only from IANA names as a limitation, if and only if I could avoid using strings. This is where I'm going to draw a line in the sand in terms of philosophy: stringly typed interfaces should be avoidable. If you disagree, that's fair enough, but I hope you'll consider this specific case where a library is actively designing with jiff as part of its public API and its users are going to want to leverage jiff types as much as possible.

@BurntSushi
Copy link
Owner

it's actually the most surprising thing thus far to see that virtually none of the important types actually support serde

Wait what? The most important types in this crate are the datetime and duration types. All of them implement both Serialize and Deserialize when the serde feature is enabled.

Provide implementations of Serialize and DerserializeOwned for TimeZone

But how do they work? This is the key thing missing. A TimeZone might literally just be a sequence of transitions. It might not be a single offset, or have an IANA time zone name or even be a POSIX time zone string. It could be none of those. Zoned makes this work because it isn't just an independent serialization of both a datetime and a time zone. There is an interplay there where an offset can be computed for the given instant in time and that can be serialized. But if all you have is a TimeZone, then what is the serialization format when it is neither a simple offset nor does it have an IANA name?

This is exactly why I asked this question because I suspected there might be some confusion in the specific concrete details.

Provide a complementary type that's only allowed to be derived from valid IANA names (so we can avoid using strings)

How would this type be used in Jiff? The problem is that a Zoned uses a TimeZone internally. So if this new stricter type is used to create a Zoned, then it will the best you can have is a method on Zoned that returns a Option<ComplementaryType>.

Alternatively, Zoned becomes generic over a time zone type, but now we're talking a very expansive change.

Provide a representation of TimeZone that allows users to de/serialize on their own (for instance, as_bytes and from_bytes methods)

That's what #30 is intended to be. Or am I misunderstanding? Note that from_bytes already exists via TimeZone::tzif.

@BurntSushi
Copy link
Owner

The overall idea here is that a TimeZone is very expansive. It doesn't really have an independent compact serialization for all possible values. So if your library is using a TimeZone and also wants to serialize that time zone somewhere independent of an instant in time, then there are some fundamental limitations facing that. The only way to donit correctly for all possible TimeZone values is either with TZif or some other format (like zic) that describes the actual time zone transition rules.

This is why the use case is important. If you want to serialize the time zone in a compact offset-or-iana format, then it probably makes sense for your library to define a more limited type representing that choice and then maybe convert to TimeZone when you need to interact with Jiff. Although I'm not sure.

Because we haven't been able to talk about use cases, the thing I still don't understand is why a TimeZone needs to be stored independent of a datetime. I get that you might want to store things like "this user's configured time zone is x." But that's a fundamentally more limited expression than what TimeZone supports, because end users don't work with the full set of time zone transition rules. They work with IANA tzids (well, technical users I guess) or, at worst, offsets that ignore DST.

@maxcountryman
Copy link

Note that from_bytes already exists via TimeZone::tzif.

Then we badly need as_bytes.

I think it's also worth noting that chrono's version of TimeZone is compatible with serde. So we can see plainly here I'm not the first, and I'll wager I won't be the last person, to scratch their head in confusion when serde support is missing here. (I know you said the two most "important" types do support serde, but the distinction feels arbitrary as soon as you need one of the types that didn't make the cut...)

I still don't understand is why a TimeZone needs to be stored independent of a datetime

If you thought that the only useful representation was a timestamp, you probably wouldn't have made TimeZone public, right? Here's where you've lost me: always needing a date-time expression to also have time zone is not something I can comprehend. Sometimes you really just want a time zone and nothing else. Maybe TimeZone is overloaded with too many responsibilities right now? (Should it be broken up into smaller, more focused types perhaps?)

I get that you might want to store things like "this user's configured time zone is x."

Then I think it pretty clearly follows that the ability to store this and only this is useful. There's a fundamental contradiction between this and the previous sentence: a configured time zone has nothing to do with a timestamp per se and I certainly don't want to store some arbitrary timestamp just so I can capture the time zone.

They work with IANA tzids (well, technical users I guess) or, at worst, offsets that ignore DST.

I would accept a type that's either IANA names or offsets; a subset of the internal kind enum for example. Either are valid for many of the use cases where you might need to store time zone data in my view.

@BurntSushi
Copy link
Owner

I think it's also worth noting that chrono's version of TimeZone is compatible with serde

Not really haha. I spilled a fair bit of words on this topic in the Jiff design and comparison docs. Chrono's implementations of its TimeZone trait in chrono proper all support serde, but none of those correspond to IANA time zones. There is the chrono-tz and tzfile crates which provide their own TimeZone trait implementations, but neither of them support Serde.

Basically, you're running into a shortcoming of Chrono that let you easily do what is arguably the wrong thing in most cases (since you lose DST info). That gave the appearance of an additional capability, but in reality it's more complicated than that.

(I know you said the two most "important" types do support serde, but the distinction feels arbitrary as soon as you need one of the types that didn't make the cut...)

It's a lot more than two... It's not like it's an arbitrary choice or something. I think the only type that doesn't support serde at present but easily could is Offset. I exclude TimeZone from this because it isn't easy to provide a serde implementation for it.

If you thought that the only useful representation was a timestamp, you probably wouldn't have made TimeZone public, right? Here's where you've lost me: always needing a date-time expression to also have time zone is not something I can comprehend. Sometimes you really just want a time zone and nothing else. Maybe TimeZone is overloaded with too many responsibilities right now? (Should it be broken up into smaller, more focused types perhaps?)

I find this to be a somewhat frustrating response. I'm trying to tell you something I don't understand. I'll add my intent more explicitly: can you help me understand it better?

Here's where you've lost me: always needing a date-time expression to also have time zone is not something I can comprehend.

That's not what I said though. I tried to be very careful with my wording, but perhaps I failed. Let me be more precise: given any arbitrary TimeZone value, the only way to turn it into an offset is to give it a concrete instant in time.

Then I think it pretty clearly follows that the ability to store this and only this is useful. There's a fundamental contradiction between this and the previous sentence: a configured time zone has nothing to do with a timestamp per se and I certainly don't want to store some arbitrary timestamp just so I can capture the time zone.

Okay, I didn't say that... What I mean is that you might find it easier to define a more limited time zone type that is specific to user configuration. A Jiff TimeZone is more like a representation of a transformation than a simple user configuration setting. That is, all valid user configuration time zones should be mappable to a Jiff TimeZone, but not all Jiff TimeZone values are mappable to a simple user configuration setting.

I would accept a type that's either IANA names or offsets; a subset of the internal kind enum for example. Either are valid for many of the use cases where you might need to store time zone data in my view.

I understand this, but we're going in circles. My previous comments asked for more details about this option. My sense is that a type like this would be better defined closer to the user configuration than in Jiff proper.

@BurntSushi
Copy link
Owner

My previous comments also provided some very specific concrete technical problems with your suggestions. But the conversation is veeeing back towards abstractions. Can we try to keep it concrete so that we can drive toward a specific solution to your problem?

@maxcountryman
Copy link

Not really haha. I spilled a fair bit of words on this topic in the Jiff design and comparison docs. Chrono's implementations of its TimeZone trait in chrono proper all support serde, but none of those correspond to IANA time zones. There is the chrono-tz and tzfile crates which provide their own TimeZone trait implementations, but neither of them support Serde.

Something about this is not right: I have a bunch of IANA names serialized to Postgres using chrono and chrono_tz (serde_json::to_string does exactly what you'd hope with Tz). So, we're clearly not talking about the same thing or perhaps there's a gap in that analysis?

My suspicion is that the "TimeZone" type is overloaded and that jiff could benefit from some layer of abstraction between its underlying concept, perhaps named differently, and the things that compose it, which are clearly useful in their own right.

With that, I unfortunately don't have more to add here that I haven't already and I'd like to leave space for other folks to chime in.

@BurntSushi
Copy link
Owner

Something about this is not right: I have a bunch of IANA names serialized to Postgres using chrono and chrono_tz (serde_json::to_string does exactly what you'd hope with Tz). So, we're clearly not talking about the same thing or perhaps there's a gap in that analysis?

Ah okay, the Tz type itself supports Serde. But DateTime<Tz> does not last I checked. If just Tz was working fine for you before, then I would definitely recommend defining your own time zone type as I suggested above. A Tz is literally only a handle to an IANA time zone.

My suspicion is that the "TimeZone" type is overloaded and that jiff could benefit from some layer of abstraction between its underlying concept, perhaps named differently, and the things that compose it, which are clearly useful in their own right.

There are trade offs involved here. A Jiff time zone is in fact a single conceptual thing: it is a transformation mapping between civil times and instants via offset. Chrono exposes bits and bobs of this concept but provides no cohesive type. As a result, there is more flexibility in Chrono's design and it is easier to do incorrect things, or at least, harder to do the right thing. But not all abstractions are created equal. Jiff for example provides an abstraction over "time zone data embedded in binary" and "tome zone data read from zoneinfo on disk." Chrono doesn't give you that. You, the library user, have to assemble that logic yourself.

It is true that a Jiff time zone is one of many possible things, but each of those things corresponds to a time zone that is used in the same way. And its API is closed, meaning you can't extend the concept in your own code (unless you provide TZif data). Conversely, in Chrono, it is an open concept at the level of the type system. So each different kind of time zone is exposed and users can extend it with their own implementations of that trait.

I think it is a good idea to have a layer of abstraction between "end user configurable time zone" and "jiff time zone that is used to map between between instants and civil times via offsets." They just aren't the same thing. But I'm not quite seeing why the former should be in Jiff. It seems like something that should be defined specific to the application or use case. For example, you might not even ask an end user for a TZ id. Instead, you ask for a location and derive the IANA TZ id from it.

@BurntSushi
Copy link
Owner

I'm re-reading the discussion now, especially the beginning. I think there are more questions to ask here:

I still wouldn't have a way to store the TimeZone itself which means I need to store the string and do lookups whenever I want a TimeZone. Internally, I want to use TimeZone wherever possible since jiff is ubiquitous.

Can you say why, more specifically, accepting a &str and doing a lookup when you need a TimeZone doesn't work for you? Like is this a philosophical disagreement here? Or is there a specific technical problem preventing you from doing this?

I should also point out, if a user wants to use a non-IANA TimeZone that's also okay: I don't have a strong reason to limit that, it's just that if the only reliable string representation is IANA names then I'm also okay with that restriction provided I don't need to give up on strong typing altogether.

I should have caught this earlier, but I actually think this might be the crux of the issue. Because "I want to accept any kind of TimeZone value" and "I want to store the TimeZone in a database" are somewhat exclusionary goals. Or at least, if you specifically want to accept any TimeZone value and absolutely must store in a DB, then the only correct way to do that is by potentially storing a lot of data in the case where a time zone isn't just a simple offset or IANA identifier.


I would really like to help you drive toward a solution here. I am very interested in trying to understand use cases better so that Jiff becomes an obvious choice for you. And especially now before Jiff 1.0 where I have a bit more freedom to do breaking changes. If you'd like to work with me on this to help me better understand what you're trying to do, that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants