-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Dates #41
base: main
Are you sure you want to change the base?
feat: add Dates #41
Conversation
Co-authored-by: Jerry Ling <[email protected]>
Co-authored-by: Jerry Ling <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, I'll need to add some tests for Datetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a high-priority PR (as I said, "nice to have, because Awkward can handle dates"), so you can come back to it, but it's not currently ready.
In NumPy, and therefore Awkward Array, the datetime and timedelta primitives are parameterized by units. If unspecified, NumPy has a default.
Specifically, the primitive strings can be
_primitive_to_dtype_datetime = re.compile(
r"datetime64\[(\s*-?[0-9]*)?(Y|M|W|D|h|m|s|ms|us|\u03bc|ns|ps|fs|as)\]"
)
_primitive_to_dtype_timedelta = re.compile(
r"timedelta64\[(\s*-?[0-9]*)?(Y|M|W|D|h|m|s|ms|us|\u03bc|ns|ps|fs|as)\]"
)
To fully implement dates, the Julia code would have to parse the string and multiply the UInt64
values in the buffer by an appropriate amount.
NumPy's default units are not the same as Julia's units, so this PR will always give wrong results:
>>> import numpy as np
>>> [int(x) for x in np.array(["2023-11-15T13:10:00"], dtype="datetime64").tobytes()]
[40, 195, 84, 101, 0, 0, 0, 0]
>>> [int(x) for x in np.array([123], dtype="timedelta64").tobytes()]
[123, 0, 0, 0, 0, 0, 0, 0]
julia> reinterpret(Dates.DateTime, Vector{UInt8}([56, 194, 84, 101, 0, 0, 0, 0]))
1-element reinterpret(Dates.DateTime, ::Vector{UInt8}):
0001-01-19T16:14:13.560
julia> reinterpret(Dates.TimePeriod, Vector{UInt8}([123, 0, 0, 0, 0, 0, 0, 0]))
ERROR: ArgumentError: cannot reinterpret `UInt8` as `Dates.TimePeriod`, type `Dates.TimePeriod` is not a bits type
Stacktrace:
[1] (::Base.var"#throwbits#323")(S::Type, T::Type, U::Type)
@ Base ./reinterpretarray.jl:16
[2] reinterpret(#unused#::Type{Dates.TimePeriod}, a::Vector{UInt8})
@ Base ./reinterpretarray.jl:62
[3] top-level scope
@ REPL[4]:1
Furthermore, it doesn't look like reinterpret(TimePeriod, buffer)
is even compilable. The tests probably pass because this code isn't being compiled.
julia> dump(now())
DateTime
instant: Dates.UTInstant{Millisecond}
periods: Millisecond
value: Int64 63835823961576
help?> DateTime()
DateTime
DateTime wraps a UTInstant{Millisecond} and interprets it according to the proleptic Gregorian calendar. shouldn't be too hard to convert between things but yeah they don't work out of the box |
Even if Julia's DateTime does all of the non-linear time units (e.g. months), leap-seconds, and such, a proper implementation would at least require us to check to be sure that Julia and NumPy do the same set of things. Like, suppose that Julia does leap-seconds and NumPy does not—we'd have to adjust for NumPy's naivete in the conversion (or vice-versa). Incidentally, this is just as much of a problem going from NumPy to Pandas to Arrow to... (Dates are hard.) |
No description provided.