Performance improvements to the name equality check when finding a repository index. #638

jonathan-aotearoa · 2024-01-29T10:01:23Z

Check List:

You have run ./mvnw verify and the project builds successfully
Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
All formatting changes by the build are committed
Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
Output matches that of calculate_average_baseline.sh
For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number)

Execution time:
Execution time of reference implementation:

… aligned with my GitHub username.

…e build error.

…s equal.

jonathan-aotearoa · 2024-01-29T10:02:30Z

See line 544 for relevant changes

gunnarmorling · 2024-01-29T19:58:04Z

src/main/java/dev/morling/onebrc/CalculateAverage_jonathanaotearoa.java

+         * To account for this, we also need to check if the station names themselves are equal.
+         * However, checking all the bytes in both names is costly.
+         * We therefore set a threshold for the maximum number of bytes, at the start and end of the name, to check.
+         * If two names with the same size and hash have the same first N and last N bytes, we're happy they are the equal.


I'm afraid that this is not valid, as it isn't guaranteed to resolve all potential conflicts.

…fractional performance gain.

gunnarmorling · 2024-01-31T18:58:42Z

A tad slower actually than before:

Benchmark 1: timeout -v 300 ./calculate_average_jonathan-aotearoa.sh 2>&1
Time (mean ± σ): 5.103 s ± 0.015 s [User: 32.934 s, System: 0.715 s]
Range (min … max): 5.091 s … 5.128 s 5 runs

Summary
jonathan-aotearoa: trimmed mean 5.098993353046667, raw times 5.09063945838,5.09602143738,5.10543017038,5.09552845138,5.128256689380001

Leaderboard
grep: ./src/main/java*/dev/morling/onebrc/CalculateAverage_jonathan-aotearoa.java: No such file or directory

#	Result (m:s.ms)	Implementation	JDK	Submitter	Notes
	00:05.098	link	21.0.2-graal	Jonathan Wright	GraalVM native binary

jonathan-aotearoa · 2024-01-31T19:37:46Z

I did some benchmarking on that method and the results weren't conclusive, so I'm not entirely surprised to see it ran fractionally slower. The issue I'm finding with micro-benchmarking methods in isolation is that the results don't always apply when the method is run in the context of the whole application. It's a great learning experience though. I haven't had an excuse to dive into Linux tools like perf before :)

gunnarmorling · 2024-01-31T20:43:25Z

Nice to hear :) Wanna close this one then?

gunnarmorling · 2024-02-01T11:08:03Z

Hey @jonathan-aotearoa, I am gonna close this one, as we're after the cut-off time, and this one didn't yield an improvement. Thanks a lot for participating in 1BRC!

jonathan-aotearoa · 2024-02-01T19:55:15Z

Hey @jonathan-aotearoa, I am gonna close this one, as we're after the cut-off time, and this one didn't yield an improvement. Thanks a lot for participating in 1BRC!

Hi @gunnarmorling, thanks for closing the issue, and thanks again for taking the time to setup and and administer this challenge. Very much appreciated.

jonathan and others added 21 commits January 25, 2024 20:32

Initial submission for jonathan_aotearoa

95fb5fc

Fixing typos

c4af210

Adding hyphens to prepare and calculate shell scripts so that they're…

e016935

… aligned with my GitHub username.

Merge branch 'gunnarmorling:main' into main

ffcdeff

Making chunk processing more robust in attempt to fix the cause of th…

6906de7

…e build error.

Fixing typo.

26cf777

Fixed the handling of files less than 8 bytes in length.

f723a5d

Additional assertion, comment improvements.

afc8738

Refactoring to improve testability. Additional assertion and comments.

37e98cf

Updating collision checking to include checking if the station name i…

a75b828

…s equal.

Minor refactoring to make param ordering consistent.

2c0b182

Adding a custom toString method for the results map.

8675767

Merge branch 'gunnarmorling:main' into main

9863e6c

Fixing collision checking bug

2b64762

Fixing rounding bug.

c1d870f

Fixing collision bug.

4e18383

Removing compareTo methods from station data classes

1591cc9

Improved index finding performance.

2ea0172

Merge branch 'gunnarmorling:main' into main

d196f82

Merge branch 'feature/collision-bytes-limit'

5ae4026

Simplification of name equality threshold.

885b563

jonathan-aotearoa changed the title ~~Performance improvements to the name equlity check when finding a repository index.~~ Performance improvements to the name equality check when finding a repository index. Jan 29, 2024

gunnarmorling reviewed Jan 29, 2024

View reviewed changes

gunnarmorling added the Potential hash collisions label Jan 29, 2024

Reverting to full equality check, but reading a long at a time for a …

f78803f

…fractional performance gain.

gunnarmorling closed this Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements to the name equality check when finding a repository index. #638

Performance improvements to the name equality check when finding a repository index. #638

jonathan-aotearoa commented Jan 29, 2024

jonathan-aotearoa commented Jan 29, 2024

gunnarmorling Jan 29, 2024

gunnarmorling commented Jan 31, 2024

jonathan-aotearoa commented Jan 31, 2024

gunnarmorling commented Jan 31, 2024

gunnarmorling commented Feb 1, 2024

jonathan-aotearoa commented Feb 1, 2024

Performance improvements to the name equality check when finding a repository index. #638

Performance improvements to the name equality check when finding a repository index. #638

Conversation

jonathan-aotearoa commented Jan 29, 2024

Check List:

jonathan-aotearoa commented Jan 29, 2024

gunnarmorling Jan 29, 2024

Choose a reason for hiding this comment

gunnarmorling commented Jan 31, 2024

jonathan-aotearoa commented Jan 31, 2024

gunnarmorling commented Jan 31, 2024

gunnarmorling commented Feb 1, 2024

jonathan-aotearoa commented Feb 1, 2024