Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!(ffi): remove visit_snapshot_schema, add logical_schema #709

Merged

Conversation

zachschuermann
Copy link
Collaborator

What changes are proposed in this pull request?

This PR removes the old visit_snapshot_schema introduced in #683 - we should just go ahead and do the 'right thing' with having a visit_schema (introduced in the other PR) and a logical_schema() function (added here) in order to facilitate visiting the snapshot schema. Additionally I've moved the schema-related things up from scan module to top-level in ffi crate. Exact changes listed below; this PR updates tests/examples to leverage the new changes.

This PR affects the following public APIs

  1. Remove visit_snapshot_schema() API
  2. Add a new logical_schema(snapshot) API so you can get the schema of a snapshot and use the visit_schema directly
  3. Renames free_global_read_schema to just free_schema
  4. Moves SharedSchema and free_schema up from mod scan into top-level ffi crate.

How was this change tested?

existing UT

Copy link

codecov bot commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.

Project coverage is 84.36%. Comparing base (baa3fc3) to head (9570e0d).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ffi/src/lib.rs 0.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #709      +/-   ##
==========================================
+ Coverage   84.35%   84.36%   +0.01%     
==========================================
  Files          75       75              
  Lines       17657    17654       -3     
  Branches    17657    17654       -3     
==========================================
  Hits        14894    14894              
+ Misses       2053     2050       -3     
  Partials      710      710              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot added the breaking-change Change that will require a version bump label Feb 21, 2025
Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Couple side notes for tracking/follow-up.

@@ -98,6 +98,7 @@ impl Snapshot {
}

/// Table [`Schema`] at this `Snapshot`s version.
// TODO should this return SchemaRef?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's already a SchemaRef under the hood, and schemas can be quite large (i.e. tables with 10k columns)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar story for the global read schema, I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

///
/// Caller is responsible for passing a valid snapshot handle.
#[no_mangle]
pub unsafe extern "C" fn logical_schema(snapshot: Handle<SharedSnapshot>) -> Handle<SharedSchema> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! So logical_schema extracts a schema from the snapshot, which can then be visited in the same way as the scan schema?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: Eventually the logical schema the engine should see has been filtered to remove internal Delta properties, hidden columns (not currently a thing), etc. But then we have three schema types, and logical/physical do make sense already. Maybe the engine should see a "clean" schema or a "public" schema or something?

Or maybe this is the right function for that, and we just need to eventually add the necessary filtering here, instead of returning the "raw" snapshot schema?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe this is the right function for that, and we just need to eventually add the necessary filtering here, instead of returning the "raw" snapshot schema?

yea was thinking to keep this as the 'public-facing' snapshot schema. not sure if you think we should keep the name logical_schema still or just snapshot_schema? WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a C way of doing snapshot.logical_schema() vs snapshot.snapshot_schema(). I prefer logical schema cuz snapshot_schema feels redundant.

Copy link
Collaborator

@OussamaSaoudi OussamaSaoudi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the nits, this LGTM

@zachschuermann zachschuermann merged commit ca18e7f into delta-io:main Feb 21, 2025
20 of 21 checks passed
@zachschuermann zachschuermann deleted the remove-visit-snapshot-schema branch February 21, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Change that will require a version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants