Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make client singleton, add a sensible readAhead. #21

Merged
merged 6 commits into from
May 9, 2024

Conversation

ahmarsuhail
Copy link
Collaborator

@ahmarsuhail ahmarsuhail commented May 8, 2024

This PR adds in a couple of optimisations:

  • Introduces a S3InputStreamFactory to allow for sharing resources such as the S3Client across instances of the input stream.
  • Introduces a readAhead. This is important as we have seen read patterns such as read(buf[], 0, 1) A length of 1 is requested a few thousand times. To prevent this hitting performance, we use 64KB as a sensible "readAhead". This is what S3A does too.

We should probably look into where the read(buf[], 0, 1) is coming from though, I doubt that's intended and could be an opportunity for a quick performance win!

@@ -24,11 +25,12 @@ public class S3SdkObjectClient implements ObjectClient, AutoCloseable {
* the S3 CRT client is created.
*/
public S3SdkObjectClient(S3AsyncClient s3AsyncClient) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have always found it weird that new S3SdkObjectClient(null) is an expected use of this constructor. Would it be more natural to have two constructors, one that takes new S3SdkObjectClient(@NonNull client) and a default one that just does what's below?

S3SdkObjectClient() {
   this(S3AsyncClient.crtBuilder().maxConcurrency(300).build())
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, makes sense, will change

* Callers must close() torelease any shared resources. Closing {@link S3SeekableInputStream} will
* only release underlying resources held by the stream.
*/
public class S3SeekableInputStreamInitializer implements AutoCloseable {
Copy link
Contributor

@CsengerG CsengerG May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably very clumsy and just not understanding this. Where is this class used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we basically proposing a Factory pattern / Provider with these initialisers and builders?

@ahmarsuhail ahmarsuhail changed the title Make client singleton, add a sensible readAhead. [DRAFT] Make client singleton, add a sensible readAhead. May 8, 2024
*/
public class S3SeekableInputStreamFactory implements AutoCloseable {

S3SdkObjectClient s3SdkObjectClient;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason this cannot be private final?


/** Builder for S3SeekableInputStream. */
@Builder
public class S3SeekableInputStreamBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused by why we need this class. The two observations I have are:

  • We call it a Builder but build() is never called on it
  • It technically is a container for an S3AsyncClient -- which then we just propagate down into S3SeekableInputStreamFactory. I wonder if it is easier to just pass the client in the factory directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that this class will be used to pass in any shared state you want across instances of your stream. This can be things like block size, number of blocks, total max memory limit, num of connections etc etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we use it like new S3SeekableInputStreamFactory(S3SeekableInputStreamBuilder.builder().build());

I didn't understand what you meant by "build()" is never called on it

Copy link
Contributor

@CsengerG CsengerG May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand what you meant by "build()" is never called on it

What I mean is that this class does not have a build() method. The class that actually has a build method (and hence is the actual Builder class) is the class that is generated under the hood by Lombok -- this is, to my knowledge, a static class, something like S3SeekableInputStreamBuilder.Builder.


I think the way you are using this today is a Configuration (we are planning to have more fields in it over time, this is the idea, right?). So then S3SeekableInputStreamConfig and S3SeekableInputStreamConfig.Builder will make sense. You are using the Config to share a configuration, whereas you can use the builder class S3SeekableStreamConfig.Builder to share "configurations not fully built up yet" -- this is the one you could use to append more and more configurations as you learn more and more about the use case/environment, and when it is ready, you .build().


// Temporarily adding range of data requested as a Referrer header to allow for easy analysis
// of access logs. This is similar to what the Auditing feature in S3A does.
builder.overrideConfiguration(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! Very curious what this will tell us about S3FileIO read patterns too!

Comment on lines 140 to 145
private IOBlock createBlockStartingAt(long start, int len) throws IOException {

long end;

// When length is 0, and so not defined, use the default block size.
if (len == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would again prefer to do overloading on a method rather than assigning special meaning to 0L.

This will also simplify the nested if-then-else structure below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really liked this suggestion, thanks!

Copy link
Contributor

@CsengerG CsengerG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! #shipit!

@ahmarsuhail ahmarsuhail merged commit 952ee3c into awslabs:main May 9, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants