-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency: Are days 24 hours or not? #48
Comments
It is true that sometimes days are treated as always 24 hours and sometimes days are treated as more or less than 24 hours, depending on time zone transitions. In contexts where the meaning of a day is ambiguous, it's banned, as in the case of The shape of when days are allowed or not, and when they are 24 hours or otherwise, was taken straight from Temporal. For example, see its I think it would be better to be very concrete here. If you're working with a While I agree that this can all seem very inconsistent and arbitrary upon initial inspection, the thing I really care about is whether you're committing bugs in your code because of it. Do you have any code you've written where you got bit by days being 24 hours when you didn't want them to be?
You really can't blanket reject the assumption. Because sometimes you want it. And if you sometimes you want it, you really can't have all functions agree on whether days are uniform or not. Like, if all functions agreed that days were non-uniform, then what would adding 1 day to |
I guess I could have been clearer. I specifically have a problem with |
Ah I see. That follows from the behavior of An alternative design is that these APIs return an error when non-zero day units are given and no relative date is given. But if a relative date is given, then whether it is civil or time zone aware would determine the length of the day. I think this Temporal issue sketches out the shape of why Temporal went with this design decision. There is even more background here and here, but it gets very dense very quickly. I am not opposed to diverging from Temporal. I specifically did not set a goal that Jiff must match Temporal. Jiff is merely heavily inspired by Temporal. With that said, I've generally deferred to Temporal on these kinds of decisions, because they quite literally reflect the result of multiple experts investing person years of effort into the design. It doesn't mean they're always right, but there's got to be something compelling IMO to push me towards reversing one of their decisions.
Can you give a concrete example? |
Currently not near my computer, but will give an example when I get back (probably tomorrow). |
The first problematic example that I stumbled upon is this (I actually didn't realize this code was incorrect until you mentioned that Span::round assumes 24 hour days): I wanted to enforce that some input spans should only contain absolute units and I figured that a good way to do this would be to normalize the span to contain seconds as the largest unit. My assumption was that any ambiguity about the absolute duration of the span would produce an error: let normalized_span = input_span.round(SpanRound::new().largest(Unit::Second)).unwrap(); This works fine in most cases, but will unexpectedly change the effective duration of the span when the span contains days and DST is involved, whereas I expected it to produce an error: let input_span = 2.days();
let normalized_span = input_span.round(SpanRound::new().largest(Unit::Second)).unwrap();
let datetime = date(2024, 11, 3).at(0, 0, 0, 0).intz("America/New_York").unwrap();
println!("{}", &datetime + input_span); // 2024-11-05T00:00:00-05:00[America/New_York]
println!("{}", &datetime + normalized_span) // 2024-11-04T23:00:00-05:00[America/New_York] |
But where did you get the input Span from? Getting a Span from two zoned datetimes will already return a Span with units no bigger than 1 hour. You'll only get bigger units if you explicitly ask for them. |
The span comes from external data (specifically some data dumps that store durations with the ISO 8601 format). |
The other problem with your suggestion is that, as of today, all of the Span APIs can be used correctly, without a relative date, using any Span returned by subtracting two datetimes in the default configuration. Your suggestion would make that not the case for spans from the difference between two DateTimes or two Dates. |
FWIW, I do find your suggestion compelling. |
Based on my (admittedly relatively surface level) understanding, I would propose this: This would almost certainly be an ergonomic hit in some cases that I haven't thought of, but my biased opinion is that the extra strictness and correctness is worth it. |
PS: There's probably a good chance that this proposal breaks some part of the API that I haven't though of yet. Also the proposed special case for spans is problematic in that it breaks the invariant that you can always add absolute units to a span. That tradeoff may or may not be worth it. |
I believe this is not true today. If you're adding two spans and either of those spans have non-uniform units bigger than days, then you need a relative date. Because the lower units could add up and overflow into bigger units. I think the key thing today is that, under the default configuration for computing spans between any two datetimes, yes, you can always add absolute units to the span returned. But the change to avoiding implicitly assuming days are always 24 hours would indeed break that for |
Honestly, I don't know. Like you said, the discussion gets very dense very quickly and the tradeoffs are hard to evaluate. I would keep the issue open for now, in case someone else wants to weigh in with more practical experience or proposals. |
@FeldrinH Can you say more about your higher level use case here? Like, what is the higher level problem you're trying to solve? This is what I understand so far:
I don't mean to ask for these details as a way to dismiss your concern, but I do think it's important to connect the threads a bit more here. In particular, I am wondering about the connection here between the time zone aware datetime after the fact instead of providing it to the rounding routine. Or perhaps that is part of the problem. I could see, for example, not realizing that you need to provide a relative datetime to If so, yeah, I can see how the failure mode is effectively a way of pushing you towards using the APIs correctly and preventing misuse. |
Basically I am parsing and processing some data dumps from some event logs. The documentation on the data format is limited, so I want to validate everything as strictly as possible. As part of that I want to enforce that there must be no ambiguity about the duration of spans. If there ever was ambiguity (e.g. if the event logs contained a span with days) then I would have to figure out what that actually means. The zoned datetimes only come into play because I later summarize the data in my local timezone (which has DST).
Not quite. If there is ambiguity about the duration of the spans then something is wrong with the input data and I will need to dig deeper into what that means. I am assuming that the durations will never contain ambiguous units like days or months, but I don't know for sure, so my goal with rounding without providing a zoned datetime was to enforce that if my assumption turns out to be wrong I will get an error. |
That is definitely an issue that I could see happening, though my specific situation was a little different. |
One suggestion I got from the Temporal folks is to provide a "marker" that makes it okay to assume days are always 24 hours. So, for example, there would be a new The advantage of a marker is that it absolves users of Jiff from providing a "dummy" relative date in the case that all days are 24 hours. I'm still going to noodle on this. And I think this is probably something that is a breaking change as well. In particular, this will result in an ergonomic hit when using some of the |
I really appreciate the API design of Jiff making the correct things easy to do. On the other hand, I feel like implementing this proposal as the default may violate the principal of least surprise and make me hesitant to just "throw stuff" at Jiff and expect it to work. While I appreciate Jiff's strong error-driven api, I actually don't want it to be the case that I'm getting random errors due to "valid" data. This isn't in opposition to an error driven api, but rather in favor of somewhat flexible input behavior by default. As mentioned, this gets very dense very fast, and I'd be concerned for general usability if this is the default and somewhat realistic to hit. On the other hand, I like the idea of "strict modes" where you can layer additional assertions on input data. This wouldn't require that the internal modeling or types change - instead you can harden the consistency of the data at the edge when you receive it, maybe with some parse_unambiguous method or something similar. |
@johnpyp Sorry, I'm having a hard time following your comment. Could you say more concretely what you'd like? Are you saying the status quo is desirable? Because there are plenty of error cases in the status quo too. This issue is just talking about one of them for a few APIs on Also, does something like this help? Which basically lets you opt out of |
This commit revises the change in Byron#28 to be a bit more robust. Originally, before Byron#28, the code was buggy because, I think, `Span` was being used like a normal "absolute" duration. But a `Span` keeps track of values for each individual unit. It isn't just a single number of nanoseconds like, e.g., `std::time::Duration` is. It's "smarter" than that. It deals with non-uniform units like days, months and years. But to do it correctly, you need a reference date. What this means is that when you get a `Span` by subtracting two `Zoned` values, you can't just ask the `Span` for the total number of days via `get_days()`. It has to be *computed*. In Byron#28, this was done for one case but not the other via the `Span::total` API. While this works today, the code was not providing a reference date, which means days are silently treated as always being 24 hours long. See BurntSushi/jiff#48 for more details where it's likely that this sort of usage will return an error in `jiff 0.2`. The main gotcha here is that since this is using `gix::date`, the `Zoned` values that are created are just "fixed offset" datetimes. They don't actually have a time zone. So in practice, such datetimes will always have all days be 24 hours long. This is not correct, but it's not clear to me that this is fixable inside the context of `git`. But at least with this patch, if you do ever end up using true time zones, then this code will be robust to that.
This commit revises the change in Byron#28 to be a bit more robust. Originally, before Byron#28, the code was buggy because, I think, `Span` was being used like a normal "absolute" duration. But a `Span` keeps track of values for each individual unit. It isn't just a single number of nanoseconds like, e.g., `std::time::Duration` is. It's "smarter" than that. It deals with non-uniform units like days, months and years. But to do it correctly, you need a reference date. What this means is that when you get a `Span` by subtracting two `Zoned` values, you can't just ask the `Span` for the total number of days via `get_days()`. It has to be *computed*. In Byron#28, this was done for one case but not the other via the `Span::total` API. While this works today, the code was not providing a reference date, which means days are silently treated as always being 24 hours long. See BurntSushi/jiff#48 for more details where it's likely that this sort of usage will return an error in `jiff 0.2`. The main gotcha here is that since this is using `gix::date`, the `Zoned` values that are created are just "fixed offset" datetimes. They don't actually have a time zone. So in practice, such datetimes will always have all days be 24 hours long. This is not correct, but it's not clear to me that this is fixable inside the context of `git`. But at least with this patch, if you do ever end up using true time zones, then this code will be robust to that.
There seems to be some inconsistency in the treatment of days in spans. In particular,
TryFrom<Span> for Duration
assumes that days are 24 hours, whereasTimestamp::saturating_add
rejects days as units that can't be resolved without a specific timezone.Personaly I think the assumption that days are 24 hours is dangerous and shouldn't be made (because it is readily violated by DST), but more than that I think all functions should at least agree on whether days are assumed to always be 24 hours or not.
The text was updated successfully, but these errors were encountered: