Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement first version of JMH microbenchmarks #18

Merged
merged 5 commits into from
May 3, 2024
Merged

Implement first version of JMH microbenchmarks #18

merged 5 commits into from
May 3, 2024

Conversation

CsengerG
Copy link
Contributor

@CsengerG CsengerG commented May 1, 2024

Description of changes:

Now that we have a functional (correct but slow) implementation of S3SeekableStream, it makes sense to put monitoring over performance.

This PR implements basic microbenchmarks that test full sequential read, forward seeks, backward seeks and a Parquet-like ("jumping around") pattern. For now, we only compare against the performance of a single standard (=non-CRT) S3 async client. Using these benchmarks we can start implementing optimisations and get relatively quick feedback of what (if anything) they improved.

To run the microbenchmarks one has to assume AWS credentials and specify two environment variables (a BUCKET and a PREFIX). We include a generator utility so that setup is easy and this can later by open-sourced. The README is updated with instructions about running these benchmarks.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


An example output of a run looks like this:

Benchmark                                                                      (key)  Mode  Cnt   Score   Error  Units
SeekingReadBenchmarks.testBackwardSeeks__withSeekableStream           random-1mb.txt    ss   15   0.886 ± 0.033   s/op
SeekingReadBenchmarks.testBackwardSeeks__withSeekableStream           random-4mb.txt    ss   15   3.024 ± 0.045   s/op
SeekingReadBenchmarks.testBackwardSeeks__withSeekableStream          random-16mb.txt    ss   15  10.806 ± 0.037   s/op
SeekingReadBenchmarks.testBackwardSeeks__withStandardAsyncClient      random-1mb.txt    ss   15   0.107 ± 0.006   s/op
SeekingReadBenchmarks.testBackwardSeeks__withStandardAsyncClient      random-4mb.txt    ss   15   0.146 ± 0.012   s/op
SeekingReadBenchmarks.testBackwardSeeks__withStandardAsyncClient     random-16mb.txt    ss   15   0.292 ± 0.020   s/op
SeekingReadBenchmarks.testForwardSeeks__withSeekableStream            random-1mb.txt    ss   15   0.802 ± 0.047   s/op
SeekingReadBenchmarks.testForwardSeeks__withSeekableStream            random-4mb.txt    ss   15   2.885 ± 0.029   s/op
SeekingReadBenchmarks.testForwardSeeks__withSeekableStream           random-16mb.txt    ss   15  11.296 ± 0.043   s/op
SeekingReadBenchmarks.testForwardSeeks__withStandardAsyncClient       random-1mb.txt    ss   15   0.108 ± 0.011   s/op
SeekingReadBenchmarks.testForwardSeeks__withStandardAsyncClient       random-4mb.txt    ss   15   0.147 ± 0.016   s/op
SeekingReadBenchmarks.testForwardSeeks__withStandardAsyncClient      random-16mb.txt    ss   15   0.292 ± 0.013   s/op
SeekingReadBenchmarks.testParquetLikeRead__withSeekableStream         random-1mb.txt    ss   15   0.880 ± 0.040   s/op
SeekingReadBenchmarks.testParquetLikeRead__withSeekableStream         random-4mb.txt    ss   15   2.992 ± 0.038   s/op
SeekingReadBenchmarks.testParquetLikeRead__withSeekableStream        random-16mb.txt    ss   15  10.897 ± 0.107   s/op
SeekingReadBenchmarks.testParquetLikeRead__withStandardAsyncClient    random-1mb.txt    ss   15   0.102 ± 0.022   s/op
SeekingReadBenchmarks.testParquetLikeRead__withStandardAsyncClient    random-4mb.txt    ss   15   0.133 ± 0.013   s/op
SeekingReadBenchmarks.testParquetLikeRead__withStandardAsyncClient   random-16mb.txt    ss   15   0.280 ± 0.016   s/op
SequentialReadBenchmark.testSequentialRead__withSeekableStream        random-1mb.txt    ss   15   1.427 ± 0.040   s/op
SequentialReadBenchmark.testSequentialRead__withSeekableStream        random-4mb.txt    ss   15   5.276 ± 0.057   s/op
SequentialReadBenchmark.testSequentialRead__withSeekableStream       random-16mb.txt    ss   15  21.818 ± 0.173   s/op
SequentialReadBenchmark.testSequentialRead__withStandardAsyncClient   random-1mb.txt    ss   15   0.086 ± 0.024   s/op
SequentialReadBenchmark.testSequentialRead__withStandardAsyncClient   random-4mb.txt    ss   15   0.147 ± 0.015   s/op
SequentialReadBenchmark.testSequentialRead__withStandardAsyncClient  random-16mb.txt    ss   15   0.430 ± 0.028   s/op

public static final List<BenchmarkObject> BENCHMARK_OBJECTS =
ImmutableList.of(
BenchmarkObject.builder()
.keyName("random-1mb.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: can we have these key names defined as constants? They seem to be accessed from multiple files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I don't know how I didn't notice this :( I will address this in a next PR.

Just run `./gradlew jmh --rerun`. (The reason for re-run is a Gradle-quirk. You may want to re-run benchmarks even when
you did not actually change the source of your project: `--rerun` turns off the Gradle optimisation that falls through
build steps when nothing changed.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have some script which can take bucket name/prefix as an command line argument and decide to create the bucket and generate the data if it does not exist and run the benchmark as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great suggestion. It will be very useful for new contributors too once this gets open sourced. For now, I will try to not block on it so created a backlog item for this.

@CsengerG CsengerG merged commit 56c7470 into awslabs:main May 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants