Skip to content
This repository has been archived by the owner on Jan 28, 2023. It is now read-only.

Support ZonedDateTime and LocalDateTime DataCol #87

Open
alphaho opened this issue Jun 30, 2020 · 3 comments
Open

Support ZonedDateTime and LocalDateTime DataCol #87

alphaho opened this issue Jun 30, 2020 · 3 comments
Assignees
Milestone

Comments

@alphaho
Copy link

alphaho commented Jun 30, 2020

I've been using Pandas for my data manipulation for quite some time and would very like to switch to Kotlin with krangl as it has much better type system support.

When I tried to port one of my pandas script to krangl, I've found it lacks support on ZonedDateTime, LocalDateTime and Duration as a DataCol. And I need to use a lot of mapping and casting to work around it. Which is not straightforward enough.

For example, pandas has support for:

  • converting a String to python datetime using pd.to_datetime(df["start_time_in_str"])
  • subtracting one Series of datetime from another to get a Series of timedelta. e.g df["end_time"] - df["start_time"]
  • adding a Series of timedelta to a Series of datetime to get another Series of datetime. e.g df["start_time"] + df["duration"]
  • converting a Series of long / double to a Series of different timedelta by multiplying by a timedelta constant. e.g. df["some_doubles"] * datetime.timedelta(hours = 1)

It would be much better if we can have such capabilities included in the library.

@holgerbrandl
Copy link
Owner

Great suggestion which is on the roadmap already.

The most tricky question would be which type to use internally in a future DateCol. To keep it simple I'd think that a single format should be supported only (if possible). https://stackoverflow.com/questions/32437550/whats-the-difference-between-instant-and-localdatetime gives a great overview, but I'm still not sure which one would be most generic to support all usecases.

To me Instant seems most versatile and can be mapped to timezone as detailed out in https://mkyong.com/java8/java-convert-instant-to-localdatetime/

Concerning your usecases: Most of them make total sense to me. However, I struggle with multiplication (last one in your list).

The difference of two DateCols could be by convention either a typed representation (such as Period/Duration) or simple an int/long (millisecond difference). Not sure which one is more intuitive.

What do you think?

@alphaho
Copy link
Author

alphaho commented Jul 30, 2020

Great to know it's already on the roadmap!

I agree that Instant would be the best choice to use internally in a DataCol for representing time in general.
But we may also need a few more typed DataCols to support Duration, LocalTime, etc. So that we may provide a better out-of-the-box experience.

After using krangl for over a month, I've found that I often need to do quite some type conversion myself before manipulating the data. So for the last usecase(the multiplication), I think it would be way better if we can favor typed representation and require less work from the user to get the job done.
But I do agree that it may not be very scalable as we may need to support so many types and so many different operations on each time.

@holgerbrandl
Copy link
Owner

I guess with more types being added, the API potentially could use an overhaul to rather support some more generic column type provider that implements all basic operations. This would e.g. allow users to register own types for improved convenience. However, I'm not yet so sure about how to implement such a feature.

Regarding the type conversions: I agree it's not so straight forward as I'm used to from R for example. On one hand, it should be somehow typed to provide sensible completion but on the other, too much typing requires casting in many situations. Feel welcome if you have ideas about how to solve this more elegantly.

@holgerbrandl holgerbrandl added this to the 0.16 milestone Nov 5, 2020
@holgerbrandl holgerbrandl self-assigned this Nov 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants