refactor: REST `Catalog` implementation #965

connortsui20 · 2025-02-12T17:19:34Z

Followup of #962

#962 Introduced a bug where it is not some of the methods allow for both StatusCode::OK and StatusCode::NO_CONTENT as success cases, when in reality it should be one or the other (this was me, sorry about that).

This PR attempts to unify the 3 different types of response helpers that essentially all do the exact same thing slightly differently. The main addition here is a function query_catalog:

    // Queries the Iceberg REST catalog with the given `Request` and a provided handler.
    pub async fn query_catalog<R, H, Fut>(&self, mut request: Request, handler: H) -> Result<R>
    where
        R: DeserializeOwned,
        H: FnOnce(Response) -> Fut,
        Fut: Future<Output = Result<R>>,
    {
        self.authenticate(&mut request).await?;

        let response = self.client.execute(request).await?;

        handler(response).await
    }

By allowing each Catalog method to specify how they want to handle the responses, it gets much finer control on the success/error cases as well as the error messages. Previously, there were 3 functions that all did similar things:

    pub async fn query<R: DeserializeOwned, E: DeserializeOwned + Into<Error>>(
        &self,
        mut request: Request,
    ) -> Result<R> {

    pub async fn execute<E: DeserializeOwned + Into<Error>>(
        &self,
        mut request: Request,
    ) -> Result<()> {

    pub async fn do_execute<R, E: DeserializeOwned + Into<Error>>(
        &self,
        mut request: Request,
        handler: impl FnOnce(&Response) -> Option<R>,
    ) -> Result<R> {

I'm also somewhat using this as a chance to refactor some other parts of this crate, mainly documentation and examples.

@Xuanwo It would be great if I could get feedback on some of these proposed changes before I keep going!

liurenjie1024

Thanks @connortsui20 for this pr, LGTM! Let's wait for a moment to have more people to take a review on this.

Xuanwo · 2025-02-17T02:28:23Z

crates/catalog/rest/src/catalog.rs

+                    Ok(deserialize_catalog_response::<ListNamespaceResponse>(response).await?)
+                }
+                StatusCode::NOT_FOUND => Err(Error::new(
+                    ErrorKind::Unexpected,


This change made me think about how users can handle cases where the namespace doesn't exist. Maybe we should add an ErrorKind::NamespaceNotExists?

Maybe this should be done in another PR.

I think that if we want an error case for NamespaceNotExists there should also be error cases for every other type of failure (for creates, updates, deletes, plus the same for tables). Though I think that should be a different PR

Xuanwo · 2025-02-17T02:28:59Z

crates/catalog/rest/src/catalog.rs

+                    "Tried to create a table under a namespace that does not exist",
+                )),
+                StatusCode::CONFLICT => Err(Error::new(
+                    ErrorKind::Unexpected,


The same question, maybe we should have TableNotFound nad TableAlreadyExists.

Maybe this should be in a different PR? I'm willing to head a PR to add a sub enum into ErrorKind that states things that can go wrong (though that would definitely be a breaking change)

Xuanwo · 2025-02-17T02:38:11Z

crates/catalog/rest/src/catalog.rs

+            Err(Error::new(ErrorKind::Unexpected, error_message))
+        };
+
+        let response = context.client.query_catalog(request, handler).await?;


To be honest, I don't like the style of creating a handle and passing it into another function. It usually breaks the logical flow and makes it hard to understand what’s happening.

It's easy to understand that:

let resp = self .context() .await? .client .query::<CommitTableResponse, ErrorResponse>(request) .await?;

Ok, the client will perform query and returns a response or error.

But it's hard to:

let handler = |response: Response| async move { ... }; let response = context.client.query_catalog(request, handler).await?;

Let's build a handler and it will do something to the response. Remember this funcation now. The client will perform query catalog and use the previous handler to parse the response. Emmmm, wait a second, what does this handler do?

How about taking a step back and simply returning an HTTP response in those functions? That way, we have full control over handling the HTTP response here. We only have a limited number of APIs to implement, and they won’t grow quickly. I’m fine with just writing them directly instead of adding generics or callbacks.

Ok, the client will perform query and returns a response or error.

I agree that this is clearer, but the whole point of this PR is that by having a single query function you are unable to handle each API call's response specifically. There will be cases when only allowing OK or NO_CONTENT is not enough: for example in the loadTable response it should be able to handle StatusCode::NOT_MODIFIED (304) as a correct case. There should have to be a new kind of query or execute function for every case that needs to be handled.

How about taking a step back and simply returning an HTTP response in those functions? That way, we have full control over handling the HTTP response here. We only have a limited number of APIs to implement, and they won’t grow quickly. I’m fine with just writing them directly instead of adding generics or callbacks.

I can try this, but I'm not sure this would make things cleaner. It just means that you are moving where you call the serialize function from inside the handler to outside the handler, and now then now you have to handle the error cases outside the helper every single time. Maybe as a best of both worlds I can specify the exact type of response every time? Like:

let response: ListNamespaceResponse = context.client.query_catalog(request, handler).await?;

I could also rename response to deserialized_response to make it super obvious what is happening.

Edit: After thinking a bit more about this, I'm trying to figure out what the signature of the custom handler should be. If it is not an asynchronous FnOnce(Response) -> Result<R>, then should it be a FnOnce(Response) -> Result<Response>? So the handler takes in the HTTP response and checks if there are any problems with it. If there are problems, then it returns Err. But if it can't detect any problems, it returns the entire Response even though it is still possible that the deserialization goes wrong? I think this could be more confusing, but I can see points on both sides.

I should also mention that the deserialize_unexpected_catalog_error should also be deserializing every response it gets into a proper error message instead of a blanket "Received unexpected response", but I felt that was out of scope for this PR.

connortsui20 · 2025-02-24T17:53:06Z

@Xuanwo could you take a look at the latest commit and see if it makes sense?

connortsui20 mentioned this pull request Feb 16, 2025

REST API responses with Spark return status code 200 instead of 204 apache/iceberg#12283

Open

3 tasks

connortsui20 force-pushed the rest-catalog-cleanup branch from 2781697 to 7af4f4a Compare February 16, 2025 03:09

connortsui20 added 11 commits February 15, 2025 22:10

first commit

09df7af

examples rewrite

4c4af42

add query_catalog and implement for list namespace

953a7bf

rewrite namespace catalog methods

74be23c

format

24fdd94

clean up examples

5620d46

complete migration to query_catalog

9d6cdad

fix string match tests

79b41ab

finalize examples

3d8cac2

clean up client module

5007da4

allow status code OK as well for all NO_CONTENT responses

0cd39f1

connortsui20 force-pushed the rest-catalog-cleanup branch from 7af4f4a to 0cd39f1 Compare February 16, 2025 03:10

format example

f867a15

connortsui20 marked this pull request as ready for review February 16, 2025 03:13

liurenjie1024 previously approved these changes Feb 17, 2025

View reviewed changes

Xuanwo reviewed Feb 17, 2025

View reviewed changes

connortsui20 dismissed liurenjie1024’s stale review via e8275d0 February 24, 2025 17:52

spell out serialized types

0f72f01

connortsui20 force-pushed the rest-catalog-cleanup branch from e8275d0 to 0f72f01 Compare February 24, 2025 17:53

Merge branch 'main' into rest-catalog-cleanup

56ebafb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: REST `Catalog` implementation #965

refactor: REST `Catalog` implementation #965

connortsui20 commented Feb 12, 2025 •

edited

Loading

liurenjie1024 left a comment

Xuanwo Feb 17, 2025

connortsui20 Feb 17, 2025 •

edited

Loading

Xuanwo Feb 17, 2025

connortsui20 Feb 17, 2025

Xuanwo Feb 17, 2025

connortsui20 Feb 17, 2025

connortsui20 Feb 17, 2025 •

edited

Loading

connortsui20 commented Feb 24, 2025 •

edited

Loading

refactor: REST Catalog implementation #965

Are you sure you want to change the base?

refactor: REST Catalog implementation #965

Conversation

connortsui20 commented Feb 12, 2025 • edited Loading

liurenjie1024 left a comment

Choose a reason for hiding this comment

Xuanwo Feb 17, 2025

Choose a reason for hiding this comment

connortsui20 Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

Xuanwo Feb 17, 2025

Choose a reason for hiding this comment

connortsui20 Feb 17, 2025

Choose a reason for hiding this comment

Xuanwo Feb 17, 2025

Choose a reason for hiding this comment

connortsui20 Feb 17, 2025

Choose a reason for hiding this comment

connortsui20 Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

connortsui20 commented Feb 24, 2025 • edited Loading

refactor: REST `Catalog` implementation #965

refactor: REST `Catalog` implementation #965

connortsui20 commented Feb 12, 2025 •

edited

Loading

connortsui20 Feb 17, 2025 •

edited

Loading

connortsui20 Feb 17, 2025 •

edited

Loading

connortsui20 commented Feb 24, 2025 •

edited

Loading