-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support use of Duration dtype in to_string
, ergonomic/perf improvement, tz-aware Datetime bugfix
#19697
base: main
Are you sure you want to change the base?
feat: Support use of Duration dtype in to_string
, ergonomic/perf improvement, tz-aware Datetime bugfix
#19697
Conversation
to_string
to_string
, ergonomic improvement, tz-aware Datetime bugfix
to_string
, ergonomic improvement, tz-aware Datetime bugfixto_string
, ergonomic/perf improvement, tz-aware Datetime bugfix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice one, just left some comments!
i also notice that this does about 2-3 things together - is it possible to do any of them separately (like the tz-aware datetime bugfix)?
9f6c6f3
to
646cbf6
Compare
The bug is fixed as a side-effect of explicitly/centrally declaring the right ISO strings, and everything else is connected, so there isn't really anything to separate out 😅 |
9949b06
to
546fb33
Compare
546fb33
to
9d99e1e
Compare
(Supersedes the now-closed #19663).
Closes #7174.
New feature
dt.to_string()
to Duration cols, transforming to an ISO8601 duration string (see: https://en.wikipedia.org/wiki/ISO_8601#Durations for details on this format).to_string
expression (for all temporal dtypes) can now take the shortcut format string"iso"
, which represents the dtype-specific ISO8601 format for the given column(s). If no explicit format is given, this shortcut string gets set, leading to some nice ergonomic improvements (detailed below).(Note: the ISO duration string conversion was validated against an established package that does the same conversion; created an ad-hoc parametric test against many thousands of randomly-generated timedeltas, and everything tied-out. Included various edge-cases in the unit tests).
Ergonomics
If you do not supply a format string (previously not possible) or you set it to
"iso"
, the associated dtype-specific ISO8601 format is used. This means that you can now make a single expression that will properly format all temporal columns with their dtype-appropriate ISO format, rather than have to separately associate an explicit format string with each dtype.Can now write...
...instead of:
Performance
fmt_duration_string
function, and then optimised it for better performance usingitoa
conversions andpush, push_str
calls instead ofwrite!
(>= 2x faster in local testing).Bugfix
cast
) did not include the expected ISO timezone offset - spotted this with some new tests and fixed it.Examples
Cast to dtype-specific ISO8601 string (the default, if no explicit format string given):
Durations do not support strftime; instead
to_string
recognises only"iso"
or"polars"
as a formatting string for durations - the latter allows for string output in the some form that we display in the frame repr: