-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make client singleton, add a sensible readAhead. #21
Conversation
@@ -24,11 +25,12 @@ public class S3SdkObjectClient implements ObjectClient, AutoCloseable { | |||
* the S3 CRT client is created. | |||
*/ | |||
public S3SdkObjectClient(S3AsyncClient s3AsyncClient) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have always found it weird that new S3SdkObjectClient(null)
is an expected use of this constructor. Would it be more natural to have two constructors, one that takes new S3SdkObjectClient(@NonNull client)
and a default one that just does what's below?
S3SdkObjectClient() {
this(S3AsyncClient.crtBuilder().maxConcurrency(300).build())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, makes sense, will change
* Callers must close() torelease any shared resources. Closing {@link S3SeekableInputStream} will | ||
* only release underlying resources held by the stream. | ||
*/ | ||
public class S3SeekableInputStreamInitializer implements AutoCloseable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably very clumsy and just not understanding this. Where is this class used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we basically proposing a Factory pattern / Provider with these initialisers and builders?
*/ | ||
public class S3SeekableInputStreamFactory implements AutoCloseable { | ||
|
||
S3SdkObjectClient s3SdkObjectClient; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason this cannot be private final
?
|
||
/** Builder for S3SeekableInputStream. */ | ||
@Builder | ||
public class S3SeekableInputStreamBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused by why we need this class. The two observations I have are:
- We call it a Builder but build() is never called on it
- It technically is a container for an S3AsyncClient -- which then we just propagate down into
S3SeekableInputStreamFactory
. I wonder if it is easier to just pass the client in the factory directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that this class will be used to pass in any shared state you want across instances of your stream. This can be things like block size, number of blocks, total max memory limit, num of connections etc etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we use it like new S3SeekableInputStreamFactory(S3SeekableInputStreamBuilder.builder().build());
I didn't understand what you meant by "build()" is never called on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand what you meant by "build()" is never called on it
What I mean is that this class does not have a build()
method. The class that actually has a build method (and hence is the actual Builder class) is the class that is generated under the hood by Lombok -- this is, to my knowledge, a static class, something like S3SeekableInputStreamBuilder.Builder
.
I think the way you are using this today is a Configuration
(we are planning to have more fields in it over time, this is the idea, right?). So then S3SeekableInputStreamConfig
and S3SeekableInputStreamConfig.Builder
will make sense. You are using the Config
to share a configuration, whereas you can use the builder class S3SeekableStreamConfig.Builder
to share "configurations not fully built up yet" -- this is the one you could use to append more and more configurations as you learn more and more about the use case/environment, and when it is ready, you .build()
.
|
||
// Temporarily adding range of data requested as a Referrer header to allow for easy analysis | ||
// of access logs. This is similar to what the Auditing feature in S3A does. | ||
builder.overrideConfiguration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this! Very curious what this will tell us about S3FileIO read patterns too!
private IOBlock createBlockStartingAt(long start, int len) throws IOException { | ||
|
||
long end; | ||
|
||
// When length is 0, and so not defined, use the default block size. | ||
if (len == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would again prefer to do overloading on a method rather than assigning special meaning to 0L
.
This will also simplify the nested if-then-else structure below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really liked this suggestion, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! #shipit!
This PR adds in a couple of optimisations:
read(buf[], 0, 1)
A length of 1 is requested a few thousand times. To prevent this hitting performance, we use 64KB as a sensible "readAhead". This is what S3A does too.We should probably look into where the
read(buf[], 0, 1)
is coming from though, I doubt that's intended and could be an opportunity for a quick performance win!