Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency: Are days 24 hours or not? #48

Open
FeldrinH opened this issue Jul 26, 2024 · 19 comments
Open

Inconsistency: Are days 24 hours or not? #48

FeldrinH opened this issue Jul 26, 2024 · 19 comments
Labels
breaking change Issues that require a breaking change for resolution. question Further information is requested

Comments

@FeldrinH
Copy link

FeldrinH commented Jul 26, 2024

There seems to be some inconsistency in the treatment of days in spans. In particular, TryFrom<Span> for Duration assumes that days are 24 hours, whereas Timestamp::saturating_add rejects days as units that can't be resolved without a specific timezone.

Personaly I think the assumption that days are 24 hours is dangerous and shouldn't be made (because it is readily violated by DST), but more than that I think all functions should at least agree on whether days are assumed to always be 24 hours or not.

@BurntSushi
Copy link
Owner

It is true that sometimes days are treated as always 24 hours and sometimes days are treated as more or less than 24 hours, depending on time zone transitions. In contexts where the meaning of a day is ambiguous, it's banned, as in the case of Timestamp::checked_add (and Timestamp::saturating_add). In cases where it could be either, like in the Span APIs, whether a day is 24 hours or not is determined by whether 1) a relative date is provided and 2) whether that relative date is a Zoned or not.

The shape of when days are allowed or not, and when they are 24 hours or otherwise, was taken straight from Temporal. For example, see its Instant.add API.

I think it would be better to be very concrete here. If you're working with a Zoned, for example, it is impossible to accidentally treat "1 day" as "always 24 hours." The APIs won't get it wrong precisely because a Zoned is involved. But when you're working with civil time, you want to be able to operate on days, yet civil time is unaware of time zones. So days are always 24 hours.

While I agree that this can all seem very inconsistent and arbitrary upon initial inspection, the thing I really care about is whether you're committing bugs in your code because of it. Do you have any code you've written where you got bit by days being 24 hours when you didn't want them to be?

Personaly I think the assumption that days are 24 hours is dangerous and shouldn't be made (because it is readily violated by DST)

You really can't blanket reject the assumption. Because sometimes you want it. And if you sometimes you want it, you really can't have all functions agree on whether days are uniform or not. Like, if all functions agreed that days were non-uniform, then what would adding 1 day to civil::DateTime do? It doesn't have a time zone, so it can't choose anything other than "1 day is 24 hours." So in that context, it is definitionally uniform. There is no other possible interpretation that I can see. And once you have that, it flows through to other APIs, like Span.

@BurntSushi BurntSushi added the question Further information is requested label Jul 26, 2024
@FeldrinH
Copy link
Author

FeldrinH commented Jul 26, 2024

I guess I could have been clearer. I specifically have a problem with TryFrom<Span> for Duration and Span::total making that assumption. I think the behavior of civil::DateTime is fine, because if you use it you more-or-less explicitly choose to ignore DST and other timezone oddities. However, the conversion to Duration and getting the total duration of a Span in absolute units are general APIs and I think making the assumption that days are 24 hours there lends itself to misuse when combined with zoned datetimes.

@BurntSushi
Copy link
Owner

I specifically have a problem with TryFrom<Span> for Duration making that assumption.

Ah I see. That follows from the behavior of Span::round, Span::total, Span::compare, Span::checked_add and Span::checked_sub. And that's all consistent with the corresponding Temporal.Duration APIs as well. Namely, days are always treated as 24 hours unless a relative time zone aware datetime is given.

An alternative design is that these APIs return an error when non-zero day units are given and no relative date is given. But if a relative date is given, then whether it is civil or time zone aware would determine the length of the day.

I think this Temporal issue sketches out the shape of why Temporal went with this design decision. There is even more background here and here, but it gets very dense very quickly.

I am not opposed to diverging from Temporal. I specifically did not set a goal that Jiff must match Temporal. Jiff is merely heavily inspired by Temporal. With that said, I've generally deferred to Temporal on these kinds of decisions, because they quite literally reflect the result of multiple experts investing person years of effort into the design. It doesn't mean they're always right, but there's got to be something compelling IMO to push me towards reversing one of their decisions.

and I think making the assumption that days are 24 hours there lends itself to misuse when combined with zoned datetimes.

Can you give a concrete example?

@FeldrinH
Copy link
Author

Currently not near my computer, but will give an example when I get back (probably tomorrow).

@FeldrinH
Copy link
Author

FeldrinH commented Jul 26, 2024

The first problematic example that I stumbled upon is this (I actually didn't realize this code was incorrect until you mentioned that Span::round assumes 24 hour days):

I wanted to enforce that some input spans should only contain absolute units and I figured that a good way to do this would be to normalize the span to contain seconds as the largest unit. My assumption was that any ambiguity about the absolute duration of the span would produce an error:

let normalized_span = input_span.round(SpanRound::new().largest(Unit::Second)).unwrap();

This works fine in most cases, but will unexpectedly change the effective duration of the span when the span contains days and DST is involved, whereas I expected it to produce an error:

let input_span = 2.days();
let normalized_span = input_span.round(SpanRound::new().largest(Unit::Second)).unwrap();

let datetime = date(2024, 11, 3).at(0, 0, 0, 0).intz("America/New_York").unwrap();
println!("{}", &datetime + input_span);     // 2024-11-05T00:00:00-05:00[America/New_York]
println!("{}", &datetime + normalized_span) // 2024-11-04T23:00:00-05:00[America/New_York]

@BurntSushi
Copy link
Owner

But where did you get the input Span from? Getting a Span from two zoned datetimes will already return a Span with units no bigger than 1 hour. You'll only get bigger units if you explicitly ask for them.

@FeldrinH
Copy link
Author

FeldrinH commented Jul 26, 2024

The span comes from external data (specifically some data dumps that store durations with the ISO 8601 format).

@BurntSushi
Copy link
Owner

The other problem with your suggestion is that, as of today, all of the Span APIs can be used correctly, without a relative date, using any Span returned by subtracting two datetimes in the default configuration. Your suggestion would make that not the case for spans from the difference between two DateTimes or two Dates.

@BurntSushi
Copy link
Owner

FWIW, I do find your suggestion compelling.

@FeldrinH
Copy link
Author

FeldrinH commented Jul 26, 2024

The other problem with your suggestion is that, as of today, all of the Span APIs can be used correctly, without a relative date, using any Span returned by subtracting two datetimes in the default configuration. Your suggestion would make that not the case for spans from the difference between two DateTimes or two Dates.

Based on my (admittedly relatively surface level) understanding, I would propose this:
The difference between two DateTimes could use hours as the largest unit, which would match Zoned.
The difference between two Dates could be made to work by adding a special case that allows adding or subtracting two spans as long as they both contain only one unit and it is the same unit.

This would almost certainly be an ergonomic hit in some cases that I haven't thought of, but my biased opinion is that the extra strictness and correctness is worth it.

@FeldrinH
Copy link
Author

FeldrinH commented Jul 26, 2024

PS: There's probably a good chance that this proposal breaks some part of the API that I haven't though of yet.

Also the proposed special case for spans is problematic in that it breaks the invariant that you can always add absolute units to a span. That tradeoff may or may not be worth it.

@BurntSushi
Copy link
Owner

Also the proposed special case for spans is problematic in that it breaks the invariant that you can always add absolute units to a span.

I believe this is not true today. If you're adding two spans and either of those spans have non-uniform units bigger than days, then you need a relative date. Because the lower units could add up and overflow into bigger units. I think the key thing today is that, under the default configuration for computing spans between any two datetimes, yes, you can always add absolute units to the span returned. But the change to avoiding implicitly assuming days are always 24 hours would indeed break that for DateTime and Date. The change to hours for DateTime would restore it there, but yes, I don't see a way to fix it for Date, which is unfortunate.

@FeldrinH
Copy link
Author

FeldrinH commented Jul 26, 2024

Honestly, I don't know. Like you said, the discussion gets very dense very quickly and the tradeoffs are hard to evaluate. I would keep the issue open for now, in case someone else wants to weigh in with more practical experience or proposals.

@BurntSushi
Copy link
Owner

@FeldrinH Can you say more about your higher level use case here? Like, what is the higher level problem you're trying to solve? This is what I understand so far:

  • You have some ISO 8601 durations in serialized form somewhere.
  • You want to parse them and convert them into durations with units no bigger than seconds. Or more generally, a duration with uniform units.
  • You want an error to appear if there is any ambiguity about what the units might mean.
  • You demonstrate the problem by comparing the duration before and after rounding with a time zone aware datetime.

I don't mean to ask for these details as a way to dismiss your concern, but I do think it's important to connect the threads a bit more here. In particular, I am wondering about the connection here between the time zone aware datetime after the fact instead of providing it to the rounding routine. Or perhaps that is part of the problem. I could see, for example, not realizing that you need to provide a relative datetime to round, trying it and then relying on it to produce an error if you're doing something wrong. Since it's silent about days, you assume it's okay and go on to do something with those durations and time zone aware datetimes. But had round given you an error, I think your remedy would have been to use a zoned relative datetime with round and thereby get correct spans. Is that right?

If so, yeah, I can see how the failure mode is effectively a way of pushing you towards using the APIs correctly and preventing misuse.

@FeldrinH
Copy link
Author

Basically I am parsing and processing some data dumps from some event logs. The documentation on the data format is limited, so I want to validate everything as strictly as possible. As part of that I want to enforce that there must be no ambiguity about the duration of spans. If there ever was ambiguity (e.g. if the event logs contained a span with days) then I would have to figure out what that actually means. The zoned datetimes only come into play because I later summarize the data in my local timezone (which has DST).

But had round given you an error, I think your remedy would have been to use a zoned relative datetime with round and thereby get correct spans. Is that right?

Not quite. If there is ambiguity about the duration of the spans then something is wrong with the input data and I will need to dig deeper into what that means. I am assuming that the durations will never contain ambiguous units like days or months, but I don't know for sure, so my goal with rounding without providing a zoned datetime was to enforce that if my assumption turns out to be wrong I will get an error.

@FeldrinH
Copy link
Author

I could see, for example, not realizing that you need to provide a relative datetime to round, trying it and then relying on it to produce an error if you're doing something wrong. Since it's silent about days, you assume it's okay and go on to do something with those durations and time zone aware datetimes.

That is definitely an issue that I could see happening, though my specific situation was a little different.

@BurntSushi
Copy link
Owner

One suggestion I got from the Temporal folks is to provide a "marker" that makes it okay to assume days are always 24 hours. So, for example, there would be a new SpanDaysAre24Hours marker type (or similarly named) that would have trait impls like impl From<SpanDaysAre24Hours> for SpanRelativeTo. But if that marker isn't given, or a relative date isn't given, then using units of days would result in an error. So in your case, your round call would have errored.

The advantage of a marker is that it absolves users of Jiff from providing a "dummy" relative date in the case that all days are 24 hours.

I'm still going to noodle on this. And I think this is probably something that is a breaking change as well. In particular, this will result in an ergonomic hit when using some of the Span APIs. With that said, I think this is somewhat mitigate by the fact that all of datetime APIs that produce spans, like Zoned::until, also expose the rounding interface. And they set the relative datetime for you.

@BurntSushi BurntSushi added the breaking change Issues that require a breaking change for resolution. label Jul 27, 2024
@johnpyp
Copy link

johnpyp commented Jul 28, 2024

I really appreciate the API design of Jiff making the correct things easy to do.

On the other hand, I feel like implementing this proposal as the default may violate the principal of least surprise and make me hesitant to just "throw stuff" at Jiff and expect it to work.

While I appreciate Jiff's strong error-driven api, I actually don't want it to be the case that I'm getting random errors due to "valid" data. This isn't in opposition to an error driven api, but rather in favor of somewhat flexible input behavior by default.

As mentioned, this gets very dense very fast, and I'd be concerned for general usability if this is the default and somewhat realistic to hit.

On the other hand, I like the idea of "strict modes" where you can layer additional assertions on input data. This wouldn't require that the internal modeling or types change - instead you can harden the consistency of the data at the edge when you receive it, maybe with some parse_unambiguous method or something similar.

@BurntSushi
Copy link
Owner

@johnpyp Sorry, I'm having a hard time following your comment. Could you say more concretely what you'd like? Are you saying the status quo is desirable? Because there are plenty of error cases in the status quo too. This issue is just talking about one of them for a few APIs on Span.

Also, does something like this help? Which basically lets you opt out of Span and lets you use something more like std::time::Duration.

BurntSushi added a commit to BurntSushi/cargo-smart-release that referenced this issue Aug 31, 2024
This commit revises the change in Byron#28 to be a bit more robust.

Originally, before Byron#28, the code was buggy because, I think, `Span` was
being used like a normal "absolute" duration. But a `Span` keeps track
of values for each individual unit. It isn't just a single number of
nanoseconds like, e.g., `std::time::Duration` is. It's "smarter" than
that. It deals with non-uniform units like days, months and years. But
to do it correctly, you need a reference date.

What this means is that when you get a `Span` by subtracting two `Zoned`
values, you can't just ask the `Span` for the total number of days via
`get_days()`. It has to be *computed*. In Byron#28, this was done for one
case but not the other via the `Span::total` API. While this works
today, the code was not providing a reference date, which means days are
silently treated as always being 24 hours long.
See BurntSushi/jiff#48 for more details where
it's likely that this sort of usage will return an error in `jiff 0.2`.

The main gotcha here is that since this is using `gix::date`, the
`Zoned` values that are created are just "fixed offset" datetimes. They
don't actually have a time zone. So in practice, such datetimes will
always have all days be 24 hours long. This is not correct, but it's not
clear to me that this is fixable inside the context of `git`. But at
least with this patch, if you do ever end up using true time zones, then
this code will be robust to that.
BurntSushi added a commit to BurntSushi/cargo-smart-release that referenced this issue Aug 31, 2024
This commit revises the change in Byron#28 to be a bit more robust.

Originally, before Byron#28, the code was buggy because, I think, `Span` was
being used like a normal "absolute" duration. But a `Span` keeps track
of values for each individual unit. It isn't just a single number of
nanoseconds like, e.g., `std::time::Duration` is. It's "smarter" than
that. It deals with non-uniform units like days, months and years. But
to do it correctly, you need a reference date.

What this means is that when you get a `Span` by subtracting two `Zoned`
values, you can't just ask the `Span` for the total number of days via
`get_days()`. It has to be *computed*. In Byron#28, this was done for one
case but not the other via the `Span::total` API. While this works
today, the code was not providing a reference date, which means days are
silently treated as always being 24 hours long.
See BurntSushi/jiff#48 for more details where
it's likely that this sort of usage will return an error in `jiff 0.2`.

The main gotcha here is that since this is using `gix::date`, the
`Zoned` values that are created are just "fixed offset" datetimes. They
don't actually have a time zone. So in practice, such datetimes will
always have all days be 24 hours long. This is not correct, but it's not
clear to me that this is fixable inside the context of `git`. But at
least with this patch, if you do ever end up using true time zones, then
this code will be robust to that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Issues that require a breaking change for resolution. question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants