Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement GetOptions extensions #7160

Closed
wants to merge 5 commits into from

Conversation

waynr
Copy link

@waynr waynr commented Feb 20, 2025

Which issue does this PR close?

Closes #7155

Rationale for this change

This is laid out in the associated issue.

What changes are included in this PR?

  • Introduces a new type, object_store::Extensions
    • Uses a HashMap<TypeId, Arc<dyn Any + Send + Sync + 'static>>
    • This is modeled after (ie copied from) and meant to be compatible with the extension mechanism in datafusion::execution::config::SessionConfig
    • It's probably worth considering using the same double hash avoidance optimization used there, but for simplicity I've left it out of this PR while it's in draft form -- I just wanted to end up with something I could test out and get early feedback on.
    • This is a public type for now because I am not sure what the preference of maintainers will be and it's simpler in my mind for now to make use of this at the datafusion level (draft PR for that coming shortly after this) if we can construct an Extensions instance once rather than on every ObjectStore method call where one is needed.
  • Adds a new extensions: Extensions field to GetOptions
    • Could easily be implemented for PutOptions as well, but I'm personally less interested in that so I'm leaving it out for now.

Are there any user-facing changes?

Yes, the new field in GetOptions and a new publicly-exposed type, object_store::Extensions.

@github-actions github-actions bot added the object-store Object Store Interface label Feb 20, 2025
@waynr
Copy link
Author

waynr commented Feb 20, 2025

By the way, it's worth mentioning that I based this PR on the tag for the 0.11.2 release because I ran into a bunch of breaking changes with respect to integer types in main branch when integrating this change into datafusion (namely this #6961). I think it would be best to rebase this onto main before if it gets approved by maintainers of course.

Also worth noting that this is a draft PR for now because I still need to test it out in https://github.com/influxdata/influxdb, write better docstrings, and run the approach by @crepererum and @tustvold before spending too much additional time on it.

@crepererum
Copy link
Contributor

I would prefer if we don't use Any directly for the extensions but a proper trait. This also allows you to properly implement Debug and even PartialEq:

trait Extension: std::fmt::Debug {
    fn as_any(&self) -> &dyn std::any::Any;
    fn partial_eq(&self, other: &dyn std::any::Any) -> bool;
}

#[derive(Debug, PartialEq)]
struct MyExt {
    x: u8,
}

impl Extension for MyExt {
    fn as_any(&self) -> &dyn std::any::Any {
        self
    }

    fn partial_eq(&self, other: &dyn std::any::Any) -> bool {
        other
            .downcast_ref::<Self>()
            .map(|other| other.partial_eq(self))
            .unwrap_or_default()
    }
}

@waynr waynr force-pushed the object_store/introduce-extensions branch from 41ca3a4 to ed3816a Compare February 21, 2025 03:10
// this is necessary because ExtWrapper is a generic impl of Extensions necessary for
// converting from external Arc<dyn Any ...> types where none of the trait bounts can
// implement PartialEq due to dyn-compatibility requirements.
true
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crepererum I don't know if this violates the spirit of your preference for making Extension trait implementations also implement PartialEq and Debug, but this was the best I could come up with that would allow me to build a Extensions instance from a HashMap<TypeId, Arc<dyn Any + Send + Sync + 'static>>. I should have a PR up shortly demonstrating the use of this in datafusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the PR I mentioned that demonstrates what I had in mind with respect to how to use this: https://github.com/apache/datafusion/pull/14805/files#diff-cd6fee4ac01d3e6f21ca25e40cbaf9a462882eca8ed80d7868079327a9ce31a9R576

I'll test this out in https://github.com/influxdata/influxdb tomorrow to see if I can get the tracing spans I wanted in our caching object store implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wrapper is flawed. People should implement the trait themselves or -- if they cannot due to the orphan rule -- create a dedicated wrapper type for whatever they need.

@waynr waynr force-pushed the object_store/introduce-extensions branch from ed3816a to 3c4a843 Compare February 21, 2025 03:55
};

/// Trait that must be implemented by extensions.
pub trait Extension: std::fmt::Debug + Send + Sync {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm concerned about adding such an API in object_store, especially a model tailored for specific use cases like metrics and tracing.

Have we considered adding a TracingObjectStore that wraps another object store and adds traces for every API call? I think this approach is more extensible and can help avoid making changes for every action.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have discussed that possibility, but it's not straightforward to do without non-trivial refactors in InfluxDB. And even if we did that, the tracing spans would not be properly parented within the hierarchy of tracing spans -- the best we would be able to do is get them parented at the root-span of a given query (as opposed to under the most recently-created descendant of that root span).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, another use case aside from tracing spans that @crepererum mentioned to me was passing file size hints via GetOptions to a caching layer to avoid extraneous HEAD requests in certain use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm concerned about adding such an API in object_store, especially a model tailored for specific use cases like metrics and tracing.

What exactly does concern you?

Have we considered adding a TracingObjectStore that wraps another object store and adds traces for every API call? I think this approach is more extensible and can help avoid making changes for every action.

That assumes that you have one single ObjectStore. Truth is, ObjectStore is an abstract interface that lends itself to dynamic composition, e.g. to add caching as wrappers.

@tustvold
Copy link
Contributor

I wonder if supporting PartialEq for extensions is all that important, my understanding is the PartialEq support is a utility for testing. As such I wonder if we just return None if either argument has extensions present?

IMO it would be nice if we didn't have to export an Extension trait publicly and could just support registering anything Send + Sync + Debug, with any necessary trait machinery kept private.

@crepererum
Copy link
Contributor

I wonder if supporting PartialEq for extensions is all that important, my understanding is the PartialEq support is a utility for testing.

Fair, I'm not dying on that hill. std::fmt::Debug would be super useful though.

@crepererum
Copy link
Contributor

IMO it would be nice if we didn't have to export an Extension trait publicly and could just support registering anything Send + Sync + Debug, with any necessary trait machinery kept private.

That can actually be done, but requires some more cleverness. I'll give it a shot.

@crepererum
Copy link
Contributor

I've hacked together an alternative implementation at #7170 which -- I think -- has a reasonable interface.

@tustvold tustvold closed this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce Extensions concept to object_store::GetOptions and object_store::PutOptions
4 participants