Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: REST Catalog implementation #965

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

connortsui20
Copy link
Contributor

@connortsui20 connortsui20 commented Feb 12, 2025

Followup of #962

#962 Introduced a bug where it is not some of the methods allow for both StatusCode::OK and StatusCode::NO_CONTENT as success cases, when in reality it should be one or the other (this was me, sorry about that).

This PR attempts to unify the 3 different types of response helpers that essentially all do the exact same thing slightly differently. The main addition here is a function query_catalog:

    // Queries the Iceberg REST catalog with the given `Request` and a provided handler.
    pub async fn query_catalog<R, H, Fut>(&self, mut request: Request, handler: H) -> Result<R>
    where
        R: DeserializeOwned,
        H: FnOnce(Response) -> Fut,
        Fut: Future<Output = Result<R>>,
    {
        self.authenticate(&mut request).await?;

        let response = self.client.execute(request).await?;

        handler(response).await
    }

By allowing each Catalog method to specify how they want to handle the responses, it gets much finer control on the success/error cases as well as the error messages. Previously, there were 3 functions that all did similar things:

    pub async fn query<R: DeserializeOwned, E: DeserializeOwned + Into<Error>>(
        &self,
        mut request: Request,
    ) -> Result<R> {

    pub async fn execute<E: DeserializeOwned + Into<Error>>(
        &self,
        mut request: Request,
    ) -> Result<()> {

    pub async fn do_execute<R, E: DeserializeOwned + Into<Error>>(
        &self,
        mut request: Request,
        handler: impl FnOnce(&Response) -> Option<R>,
    ) -> Result<R> {

I'm also somewhat using this as a chance to refactor some other parts of this crate, mainly documentation and examples.

@Xuanwo It would be great if I could get feedback on some of these proposed changes before I keep going!

@connortsui20 connortsui20 marked this pull request as ready for review February 16, 2025 03:13
liurenjie1024
liurenjie1024 previously approved these changes Feb 17, 2025
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @connortsui20 for this pr, LGTM! Let's wait for a moment to have more people to take a review on this.

Ok(deserialize_catalog_response::<ListNamespaceResponse>(response).await?)
}
StatusCode::NOT_FOUND => Err(Error::new(
ErrorKind::Unexpected,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change made me think about how users can handle cases where the namespace doesn't exist. Maybe we should add an ErrorKind::NamespaceNotExists?

Maybe this should be done in another PR.

Copy link
Contributor Author

@connortsui20 connortsui20 Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we want an error case for NamespaceNotExists there should also be error cases for every other type of failure (for creates, updates, deletes, plus the same for tables). Though I think that should be a different PR

"Tried to create a table under a namespace that does not exist",
)),
StatusCode::CONFLICT => Err(Error::new(
ErrorKind::Unexpected,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question, maybe we should have TableNotFound nad TableAlreadyExists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be in a different PR? I'm willing to head a PR to add a sub enum into ErrorKind that states things that can go wrong (though that would definitely be a breaking change)

Err(Error::new(ErrorKind::Unexpected, error_message))
};

let response = context.client.query_catalog(request, handler).await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't like the style of creating a handle and passing it into another function. It usually breaks the logical flow and makes it hard to understand what’s happening.

It's easy to understand that:

let resp = self
        .context()
        .await?
        .client
        .query::<CommitTableResponse, ErrorResponse>(request)
        .await?;

Ok, the client will perform query and returns a response or error.

But it's hard to:

let handler = |response: Response| async move { ... };
let response = context.client.query_catalog(request, handler).await?;

Let's build a handler and it will do something to the response. Remember this funcation now. The client will perform query catalog and use the previous handler to parse the response. Emmmm, wait a second, what does this handler do?


How about taking a step back and simply returning an HTTP response in those functions? That way, we have full control over handling the HTTP response here. We only have a limited number of APIs to implement, and they won’t grow quickly. I’m fine with just writing them directly instead of adding generics or callbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the client will perform query and returns a response or error.

I agree that this is clearer, but the whole point of this PR is that by having a single query function you are unable to handle each API call's response specifically. There will be cases when only allowing OK or NO_CONTENT is not enough: for example in the loadTable response it should be able to handle StatusCode::NOT_MODIFIED (304) as a correct case. There should have to be a new kind of query or execute function for every case that needs to be handled.

Copy link
Contributor Author

@connortsui20 connortsui20 Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about taking a step back and simply returning an HTTP response in those functions? That way, we have full control over handling the HTTP response here. We only have a limited number of APIs to implement, and they won’t grow quickly. I’m fine with just writing them directly instead of adding generics or callbacks.

I can try this, but I'm not sure this would make things cleaner. It just means that you are moving where you call the serialize function from inside the handler to outside the handler, and now then now you have to handle the error cases outside the helper every single time. Maybe as a best of both worlds I can specify the exact type of response every time? Like:

let response: ListNamespaceResponse = context.client.query_catalog(request, handler).await?;

I could also rename response to deserialized_response to make it super obvious what is happening.


Edit: After thinking a bit more about this, I'm trying to figure out what the signature of the custom handler should be. If it is not an asynchronous FnOnce(Response) -> Result<R>, then should it be a FnOnce(Response) -> Result<Response>? So the handler takes in the HTTP response and checks if there are any problems with it. If there are problems, then it returns Err. But if it can't detect any problems, it returns the entire Response even though it is still possible that the deserialization goes wrong? I think this could be more confusing, but I can see points on both sides.

I should also mention that the deserialize_unexpected_catalog_error should also be deserializing every response it gets into a proper error message instead of a blanket "Received unexpected response", but I felt that was out of scope for this PR.

@connortsui20
Copy link
Contributor Author

connortsui20 commented Feb 24, 2025

@Xuanwo could you take a look at the latest commit and see if it makes sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants