Skip to content

Commit

Permalink
fmt/temporal: fix parsing of spans like PT843517081h
Browse files Browse the repository at this point in the history
Because Jiff spans (like Temporal durations) represent each unit
individually, and because ISO 8601 require the use of fractional seconds
to represent sub-second units, there is an inherent loss of fidelity
when spans are serialized to the ISO 8601 duration format.

For example, a Jiff span like `1 second 2000 milliseconds` is distinct from
`3 seconds` even though they represent the same amount of elapsed time.
These types of spans are unbalanced.

When a span like `1 second 2000 milliseconds` is ISO 8601 serialized,
it's turned into `PT3s` because there simply is no way to represent
greater-than-second durations in smaller-than-second units in the ISO
8601 duration format. So when `PT3s` is deserialized, Jiff of course has
no idea that `1 second 2000 milliseconds` was actually intended, and
thus treats it as `3 seconds`.

On top of this, Jiff imposes fairly strict limits on the individual
units of a `Span`. Other than nanoseconds, every individual unit can
express the full range of time supported by Jiff (`-9999-01-01` through
`9999-12-31`) and nothing more. So if one serializes a span to the ISO
8601 format with large millisecond, microsecond or nanosecond
components, those have to be folded into the larger hour, minute or
second units. This in turn can create a ISO 8601 duration whose hour,
minute and second units exceed Jiff's unit limits. So in order to
preserve the ability to at least roundtrip all Jiff spans (even if the
individual unit values are lost), Jiff will automatically rebalance
these "larger" units into smaller units at parse time.

This is a big complicated mess and it turns out I got one part of this
wrong: I was only re-balancing units at parse time when we parsed a
fractional component. But in fact, we should be re-balancing spans even
when there isn't a fractional component. Namely, the milliseconds,
microseconds and nanosecond components can add up to a whole number of
seconds, resulting in a whole number of seconds in the corresponding ISO
8601 duration.

This bug was found by @addisoncrump's fuzzer. Nice work.

Fixes #59
  • Loading branch information
BurntSushi committed Aug 3, 2024
1 parent f8cb134 commit 1a913e7
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 12 deletions.
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
0.1.5 (TBD)
==================
This release includes some improvements and bug fixes for Jiff's `strtime`
APIs.
This release includes some improvements and bug fixes, particularly for Jiff's
`strtime` APIs.

Enhancements:

Expand All @@ -10,6 +10,8 @@ Add support for `%V` for formatting _and_ parsing IANA time zone identifiers.

Bug fixes:

* [#59](https://github.com/BurntSushi/jiff/issues/59):
Fixes a bug where some `Span`s could not be roundtripped through ISO 8601.
* [#71](https://github.com/BurntSushi/jiff/issues/71):
Tweak wording in documentation of "printf"-style API.
* [#73](https://github.com/BurntSushi/jiff/issues/73):
Expand Down
15 changes: 15 additions & 0 deletions src/fmt/temporal/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1207,6 +1207,10 @@ impl SpanPrinter {

#[cfg(test)]
mod tests {
use alloc::string::ToString;

use crate::Unit;

use super::*;

// This test ensures that strings like `2024-07-15+02` fail to parse.
Expand Down Expand Up @@ -1238,4 +1242,15 @@ mod tests {
@r###"failed to parse year in date "-000000-01-01": year zero must be written without a sign or a positive sign, but not a negative sign"###,
);
}

// Regression test for: https://github.com/BurntSushi/jiff/issues/59
#[test]
fn fractional_duration_roundtrip() {
let span1: Span = "Pt843517081,1H".parse().unwrap();
let span2: Span = span1.to_string().parse().unwrap();
assert_eq!(
span1.total(Unit::Hour).unwrap(),
span2.total(Unit::Hour).unwrap()
);
}
}
66 changes: 56 additions & 10 deletions src/fmt/temporal/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1045,14 +1045,32 @@ impl SpanParser {
// lowest unit of time.
break;
} else {
span =
let result =
span.try_units_ranged(unit, value).with_context(|| {
err!(
"failed to set value {value:?} \
as {unit} unit on span",
unit = Unit::from(unit).singular(),
)
})?;
});
// This is annoying, but because we can write out a larger
// number of hours/minutes/seconds than what we actually
// support, we need to be prepared to parse an unbalanced span
// if our time units are too big here. This entire dance is
// because ISO 8601 requires fractional seconds to represent
// milli-, micro- and nano-seconds. This means that spans
// cannot retain their full fidelity when roundtripping through
// ISO 8601. However, it is guaranteed that their total elapsed
// time represented will never change.
span = match result {
Ok(span) => span,
Err(_) => span_fractional_time(
unit,
value,
t::SubsecNanosecond::N::<0>(),
span,
)?,
};
}
}
Ok(Parsed { value: (span, parsed_any), input })
Expand Down Expand Up @@ -1203,8 +1221,8 @@ impl SpanParser {
/// `500_000_000`. The span given would just be `1d`. The span returned would
/// be `P1dT1h30m`.
///
/// This can if the resulting units would be too large for the limits on a
/// `span`.
/// This can error if the resulting units would be too large for the limits on
/// a `span`.
///
/// # Panics
///
Expand All @@ -1219,12 +1237,12 @@ fn span_fractional_time(
// appropriate. In general, we always create a balanced span, but there
// are some cases where we can't. For example, if one serializes a span
// with both the maximum number of seconds and the maximum number of
// milliseconds, then it isn't technically balanced, but only because
// of the limits placed on each of the units. When this kind of span is
// serialized to a string, it results in a second value that is actually
// bigger than the maximum allowed number of seconds in a span. So here,
// we have to reverse that operation and spread the seconds over smaller
// units. This in turn creates an unbalanced span. Annoying.
// milliseconds, then this just can't be balanced due to the limits on
// each of the units. When this kind of span is serialized to a string,
// it results in a second value that is actually bigger than the maximum
// allowed number of seconds in a span. So here, we have to reverse that
// operation and spread the seconds over smaller units. This in turn
// creates an unbalanced span. Annoying.
//
// The above is why we have `if unit_value > MAX { <do adjustments> }` in
// the balancing code below. Basically, if we overshoot our limit, we back
Expand Down Expand Up @@ -1467,6 +1485,34 @@ mod tests {
"###);
}

#[test]
fn ok_temporal_duration_unbalanced() {
let p =
|input| SpanParser::new().parse_temporal_duration(input).unwrap();

insta::assert_debug_snapshot!(
p(b"PT175307616h10518456960m1774446656760s"), @r###"
Parsed {
value: PT175307616h10518456960m1774446656760s,
input: "",
}
"###);
insta::assert_debug_snapshot!(
p(b"Pt843517082H"), @r###"
Parsed {
value: PT175307616h10518456960m1774446660000s,
input: "",
}
"###);
insta::assert_debug_snapshot!(
p(b"Pt843517081H"), @r###"
Parsed {
value: PT175307616h10518456960m1774446656400s,
input: "",
}
"###);
}

#[test]
fn ok_temporal_datetime_basic() {
let p = |input| {
Expand Down

0 comments on commit 1a913e7

Please sign in to comment.