Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeltaTable.hisotry(): return timestamp column as datetime (not epoch integer) #3109

Open
keen85 opened this issue Jan 9, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@keen85
Copy link

keen85 commented Jan 9, 2025

Description

When accessing the history of a Delta Table via delta-spark, the version timestamp is returned as TimestampType column.
delta-rs is different here, it returns an epoch timestamp as integer:

I suggest to make behavior of delta-rs consistent to delta-rs and return proper Python datetime value.

Use Case
Make it easier for users to interpret the result of DeltaTable.hisotry().

Conversion in Python is fairly easy:

dt = DeltaTable("../rust/tests/data/simple_table")
ts_epoch = dt.history(1)[0]["timestamp"]
ts_datetime = datetime.utcfromtimestamp(ts / 1000.0)

However, this would be a breaking change 😬

Related Issue(s)
I found nothing

@keen85 keen85 added the enhancement New feature or request label Jan 9, 2025
@ion-elgreco
Copy link
Collaborator

That would actually require quite some changes, the CommitInfo struct holds the timestamp as Option.

@roeap Any idea's on this?

@roeap
Copy link
Collaborator

roeap commented Jan 12, 2025

I would not be too concerned about breaking that, as it does reduce the source of some errors that might happen (i.e. Millis / micros etc.)

Commit Infos are generally annoing though, as there is no real convention / guarantees for it. delta-rs just does kind of what spark does, but who know what other writers are doing 😆.

So parsing all of that might be difficult, since the contents of a field are not really known to us unless we wrote it ourselves.

@rtyler
Copy link
Member

rtyler commented Jan 12, 2025

There is no guaranteed schema of commitInfo in the protocol, my vote here would be to do nothing rather than chase compatibility with undefined Spark behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants