-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support timezones when reading and writing datetime columns #123
Comments
Hey @theroggy have you done any work on this? If not I might have a look if that's okay. |
@m-richards No I haven't... feel free! |
So having a look at this in a bit more detail there's more complexity than I originally realised (unless I'm missing something), numpy datetime64 doesn't support utc offsets- which seems to make numpy arrays as the intermediate between dataframes and ogr a little tricky to make work. Some thoughts:
(apologies if I'm missing something obvious, I thought this might be alright point to get familiar with some the pyogrio internals, but still verying much learning at this point) |
Hmm... indeed... when writing it seems the datetimes at the moment will be implicitly converted to UTC already during the conversion of pandas series to numpy arrays: |
At the time of adding write support I noticed that the conversion to python Not a super elegant solution, but it might be an option here as well... |
That could indeed be something to test and time. It seems that the OGRParseDate ignores milliseconds though, so if we do this it would have to be conditional on the presence of that, making it less attractive option .. |
I think getting back the datetimes in UTC would already be an improvement, and maybe the best we can do on the short term. |
I was looking into this a bit, some observations:
|
I've been looking at this again too and have opened a very rough proof of concept at #253, to try and get us moving forward on this (joris, you already do have detailed notes, I don't know if you have a prototype as well, more than happy for that to be superceded). Some things I've learnt as part of doing that:
at least as far as I can tell (I might be missing something). At this point I decided to park this and look at writing strings, I can't imagine this is very performant (but if we decide it's worth benchmarking explicitly it I can go back and have a look). |
Thanks Matt!
I think going through strings is a good alternative to explore, but in case we want to get back to using
I think it might be worth checking performance, but I don't necessarily expect it to be too bad. Currently, the code path for writing datetimes is also not great, because we need to convert each timestamp value (stored as a single integer number by numpy/pandas) to a python datetime.datetime object (which stores components, and get the year, month, day, hour etc from it). |
Timezone information is ignored at the moment, both for reading and for writing.
See https://github.com/Toblerity/Fiona/blob/791785b813fce97e5c3d60bd725fa2190645d2e3/fiona/ogrext.pyx#L401-L404 and https://github.com/Toblerity/Fiona/blob/791785b813fce97e5c3d60bd725fa2190645d2e3/fiona/ogrext.pyx#L431-L435 how fiona handles this (and https://gdal.org/development/rfc/rfc56_millisecond_precision.html?highlight=TZFlag#core-changes with very brief details)
The PR that introduced this in fiona: Toblerity/Fiona#915
The text was updated successfully, but these errors were encountered: