Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fmt: improve strptime so that it can parse adjacent values more easily #70

Merged
merged 3 commits into from
Aug 2, 2024

Conversation

BurntSushi
Copy link
Owner

Previously, strptime worked greedily. That is, specifiers like %Y
matched as many ASCII digits as possible, and then tried to parse all
of them into a year value.

But this meant that using %Y%m%d to parse something like 20240730
couldn't work: %Y would consume all of 20240730 and then of course
fail since the value exceeds the year boundaries of Jiff.

Instead, we tweak how parsing works so that it only consumes the
maximum possible number of digits given its boundaries.

Fixes #62

Previously, in strptime, we just skipped over ASCII digits and flags
when they might appear. But in the course of trying to fix #62, it
looks like we're going to want to have parsing respect padding
settings. So this commit refactors parsing to actually extract the
padding settings in the same way that formatting does.
In order to make #62 work, we need some way of "limiting" the parsing
of numbers. So for example, when parsing a year, we shouldn't try to
parse more than 4 digits. And when parsing a month, we shouldn't try to
parse more than 2 digits.

This isn't a big deal, except it ends up being inconsistent with using
the padding settings during formatting. Say one does `%3m` for example
to format the month number with up to two leading zeroes. Well, if
parsing limits itself to 2 digits, then it can't parse this month! And
indeed, this appears to be how C's `strptime` and `strftime` work. They
aren't consistent with one another! So we make parsing "aware" of the
padding setting such that `%3m` can happily parse `005`.

This also gets around the thorny issue of backcompat concerns if we
ended up expanding our years past -9999..=9999. Users could do, for
example, `%5Y` to use a 5 digit year for *both* parsing and formatting.
Boom.

Fixes #62
@BurntSushi BurntSushi merged commit 167b112 into master Aug 2, 2024
14 checks passed
@BurntSushi BurntSushi deleted the ag/better-strptime-semantics branch August 2, 2024 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing continuous date values
1 participant