Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements to the name equality check when finding a repository index. #638

Closed
wants to merge 22 commits into from

Conversation

jonathan-aotearoa
Copy link
Contributor

Check List:

  • You have run ./mvnw verify and the project builds successfully
  • Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
  • All formatting changes by the build are committed
  • Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
  • Output matches that of calculate_average_baseline.sh
  • For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number)
  • Execution time:
  • Execution time of reference implementation:

@jonathan-aotearoa
Copy link
Contributor Author

See line 544 for relevant changes

@jonathan-aotearoa jonathan-aotearoa changed the title Performance improvements to the name equlity check when finding a repository index. Performance improvements to the name equality check when finding a repository index. Jan 29, 2024
* To account for this, we also need to check if the station names themselves are equal.
* However, checking all the bytes in both names is costly.
* We therefore set a threshold for the maximum number of bytes, at the start and end of the name, to check.
* If two names with the same size and hash have the same first N and last N bytes, we're happy they are the equal.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that this is not valid, as it isn't guaranteed to resolve all potential conflicts.

@gunnarmorling
Copy link
Owner

A tad slower actually than before:

Benchmark 1: timeout -v 300 ./calculate_average_jonathan-aotearoa.sh 2>&1
Time (mean ± σ): 5.103 s ± 0.015 s [User: 32.934 s, System: 0.715 s]
Range (min … max): 5.091 s … 5.128 s 5 runs

Summary
jonathan-aotearoa: trimmed mean 5.098993353046667, raw times 5.09063945838,5.09602143738,5.10543017038,5.09552845138,5.128256689380001

Leaderboard
grep: ./src/main/java*/dev/morling/onebrc/CalculateAverage_jonathan-aotearoa.java: No such file or directory

# Result (m:s.ms) Implementation JDK Submitter Notes
00:05.098 link 21.0.2-graal Jonathan Wright GraalVM native binary

@jonathan-aotearoa
Copy link
Contributor Author

I did some benchmarking on that method and the results weren't conclusive, so I'm not entirely surprised to see it ran fractionally slower. The issue I'm finding with micro-benchmarking methods in isolation is that the results don't always apply when the method is run in the context of the whole application. It's a great learning experience though. I haven't had an excuse to dive into Linux tools like perf before :)

@gunnarmorling
Copy link
Owner

Nice to hear :) Wanna close this one then?

@gunnarmorling
Copy link
Owner

Hey @jonathan-aotearoa, I am gonna close this one, as we're after the cut-off time, and this one didn't yield an improvement. Thanks a lot for participating in 1BRC!

@jonathan-aotearoa
Copy link
Contributor Author

Hey @jonathan-aotearoa, I am gonna close this one, as we're after the cut-off time, and this one didn't yield an improvement. Thanks a lot for participating in 1BRC!

Hi @gunnarmorling, thanks for closing the issue, and thanks again for taking the time to setup and and administer this challenge. Very much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants