-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame(::PyPandasDataFrame)
converts date & datetime to bytes
#293
Comments
DataFrame(::PyPandasDataFrame)
converts datetime to bytesDataFrame(::PyPandasDataFrame)
converts datetime to bytes
DataFrame(::PyPandasDataFrame)
converts datetime to bytesDataFrame(::PyPandasDataFrame)
converts date & datetime to bytes
If you replace the |
Just looking at this now. It all comes down to the fact that numpy arrays containing So there's two things that could be fixed here:
|
(PS thanks for the report and the MWE) |
Just stumbled across that problem and solved it this way: function pytimestamp_to_datetime(t::PyArray)
DateTime(1970) + Second(reinterpret(Int64, t)[1] / 1000000000)
end
function pytimestamp_to_datetime(v::AbstractVector{<:PyArray})
pytimestamp_to_datetime.(v)
end EDIT: This only holds for datetime64[ns], otherwise the factor is different. I just filed a tentative PR for a general solution |
@cjdoris UPDATE: That doesn't work, columns will be vectors of Py objects |
@tomdstone |
Just added support for timedelta and timedelta64 conversion to Dates.CompoundPeriod. |
@hhaensel sorry, I won't have time to implement it due to how work is right now (we are in the middle of a move). I might follow up in a few months, but I definitely wouldn't hold your breath waiting for me. I appreciate the thought though! |
Thanks for the reply. No need to hurry. I just filed the PR as I was in need of a solution for my use case. |
@cjdoris |
Just want to share what's working now with the PR julia> x = Py(Second(1))
Python: numpy.timedelta64(1,'s')
julia> pyconvert(Second, x)
1 second
julia> x = Py(Second(1) + Nanosecond(1))
Python: numpy.timedelta64(1000000001,'ns')
julia> y = pyconvert(Any, x)
1 second, 1 nanosecond
julia> typeof(y)
Dates.CompoundPeriod
julia> jdf = DataFrame(x = [now() + Second(rand(1:1000)) for _ in 1:100], y = [Second(n) for n in 1:100]);
julia> pdf = pytable(jdf)
Python:
x y
0 2023-07-04 11:29:27.781 0 days 00:00:01
1 2023-07-04 11:30:02.781 0 days 00:00:02
2 2023-07-04 11:40:17.781 0 days 00:00:03
3 2023-07-04 11:31:11.781 0 days 00:00:04
... 5 more lines ...
98 2023-07-04 11:37:21.781 0 days 00:01:39
99 2023-07-04 11:35:53.781 0 days 00:01:40
[100 rows x 2 columns]
julia> jdf2 = DataFrame(PyTable(pdf))
100×2 DataFrame
Row │ x y
│ DateTime Compound…
─────┼──────────────────────────────────────
1 │ 2023-07-04T11:29:27.781 1 second
2 │ 2023-07-04T11:30:02.781 2 seconds
3 │ 2023-07-04T11:40:17.781 3 seconds
⋮ │ ⋮ ⋮
99 │ 2023-07-04T11:37:21.781 99 seconds
100 │ 2023-07-04T11:35:53.781 100 seconds
95 rows omitted |
This issue has been marked as stale because it has been open for 30 days with no activity. If the issue is still relevant then please leave a comment, or else it will be closed in 7 days. |
@cjdoris I think this is still relevant |
I agree. Just stumbled on this when using pandas functionality to convert (legacy) xls data to xlsx. The dates became PyArray{UInt8}. |
I'd be ready to adapt the my PR to the latest changes of PythonCall. Should I continue to work on it? Is the refactoring complete or should I still wait a bit? |
Yep the refactor is done. Can you clarify what the PR will change? |
I think I summarised everything quite nicely above (#293 (comment)) The related PR is #334, where you commented that you are refactoring. |
When doing some work involving dataframes in python via PythonCall, it seems like
DataFrame(PyTable(p))
wherep
is a pandas data table converts the date and datetime columns into byte vectors. Is this issue related to the issue #265 with milliseconds vs microseconds, or due to a missing part of theDataFrame(::PyPandasDataFrame)
implementation?Here are a few minimal examples, in a conda environment with pandas.
This results in:
The same thing happens when initially defining
b
as a pandas dataframe, so the microsecond issue in #265 seems to not be the problem?The text was updated successfully, but these errors were encountered: