Second attempt with various improvements #510

JamalMulla · 2024-01-20T16:34:03Z

Check List:

Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
All formatting changes by the build are committed
Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
Output matches that of calculate_average_baseline.sh
For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number)

Execution time: 3s
Execution time of reference implementation: 4m

Shaved off around 1.5s from my initial submission. Also trying with graal native binary as it seems to give slightly better results and all the top entries are also using it.

Most of the changes are just reducing function calls, inlining the hashing, etc

gunnarmorling · 2024-01-20T19:27:16Z

Seeing a test failure now:

+ timeout -v 300 ./test.sh JamalMulla

Using java version 21.0.1-graal in this shell.
Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-10000-unique-keys.txt
Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-10.txt
Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-1.txt
Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-20.txt
Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-2.txt
Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-3.txt
Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-boundaries.txt
Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-complex-utf8.txt
1c1
< ;118.9;118.9;118.9
---
> B;8.9;8.9;8.9

FAILURE: ./test.sh JamalMulla failed

JamalMulla · 2024-01-20T22:20:57Z

@gunnarmorling Been fixed now.

gunnarmorling

This is not passing tests now:

Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-10000-unique-keys.txt
2325a2326,2331
> id3090;1.0;1.0;1.0
> id3091;1.0;1.0;1.0
> id3092;1.0;1.0;1.0
> id3093;1.0;1.0;1.0
> id3094;1.0;1.0;1.0
> id3095;1.0;1.0;1.0
3430a3437,3442
> id4090;1.0;1.0;1.0
> id4091;1.0;1.0;1.0
> id4092;1.0;1.0;1.0
> id4093;1.0;1.0;1.0
> id4094;1.0;1.0;1.0
> id4095;1.0;1.0;1.0
7868a7881,7886
> id8090;1.0;1.0;1.0
> id8091;1.0;1.0;1.0
> id8092;1.0;1.0;1.0
> id8093;1.0;1.0;1.0
> id8094;1.0;1.0;1.0
> id8095;1.0;1.0;1.0
8973a8992,8993
> id9090;1.0;1.0;1.0
> id9091;1.0;1.0;1.0

FAILURE: ./test.sh JamalMulla failed

Also, where are you dealing with hash collisions?

JamalMulla · 2024-01-21T12:25:00Z

This is not passing tests now:

Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-10000-unique-keys.txt
2325a2326,2331
> id3090;1.0;1.0;1.0
> id3091;1.0;1.0;1.0
> id3092;1.0;1.0;1.0
> id3093;1.0;1.0;1.0
> id3094;1.0;1.0;1.0
> id3095;1.0;1.0;1.0
3430a3437,3442
> id4090;1.0;1.0;1.0
> id4091;1.0;1.0;1.0
> id4092;1.0;1.0;1.0
> id4093;1.0;1.0;1.0
> id4094;1.0;1.0;1.0
> id4095;1.0;1.0;1.0
7868a7881,7886
> id8090;1.0;1.0;1.0
> id8091;1.0;1.0;1.0
> id8092;1.0;1.0;1.0
> id8093;1.0;1.0;1.0
> id8094;1.0;1.0;1.0
> id8095;1.0;1.0;1.0
8973a8992,8993
> id9090;1.0;1.0;1.0
> id9091;1.0;1.0;1.0

FAILURE: ./test.sh JamalMulla failed

Also, where are you dealing with hash collisions?

Very confusing cause it passes locally. What does that output mean? 8973a8992?

And here is where I'm handling collisions

JamalMulla · 2024-01-21T12:25:51Z

I also don't understand why the Github build doesn't report test failures? seems like an obvious build step

gunnarmorling · 2024-01-21T13:13:20Z

I also don't understand why the Github build doesn't report test failures? seems like an obvious build step

It's just that we don't have a GH Actions job set up which would run the tests. The reason being that the required SDK would have to be installed. @filiphr had been looking into it, but it is not completed yet.

gunnarmorling · 2024-01-21T13:14:35Z

Here's the original PR: #106. Any help for driving it over the finishing line will be more than welcome :)

gunnarmorling · 2024-01-23T18:52:23Z

Tests look good now, but it takes unduly long for the 10K keyset (see create_measurements_3.sh). Could you take a look, I am running the top 15 entries against that keyset, and as yours is up there, it would be nice if it could handle it. Thx.

JamalMulla · 2024-01-23T20:37:05Z

Tests look good now, but it takes unduly long for the 10K keyset (see create_measurements_3.sh). Could you take a look, I am running the top 15 entries against that keyset, and as yours is up there, it would be nice if it could handle it. Thx.

Hmm I can't reproduce this behaviour at all. i ran create_measurements3.sh with 1billion and I get this

 Performance counter stats for './calculate_average_JamalMulla.sh':

         34,723.19 msec task-clock                #    6.696 CPUs utilized          
             6,147      context-switches          #  177.029 /sec                   
                 7      cpu-migrations            #    0.202 /sec                   
           270,316      page-faults               #    7.785 K/sec                  
   <not supported>      cycles                                                      
   <not supported>      instructions                                                
   <not supported>      branches                                                    
   <not supported>      branch-misses                                               

       5.185930430 seconds time elapsed

      33.162106000 seconds user
       1.400924000 seconds sys

Is there something else you did for the test?

JamalMulla · 2024-01-26T21:07:02Z

@gunnarmorling any ideas? i would really like to get this evaluated. If you can provide more details i'll try to fix but i can't even reproduce it as is

tivrfoa · 2024-01-26T23:18:06Z

@gunnarmorling any ideas? i would really like to get this evaluated. If you can provide more details i'll try to fix but i can't even reproduce it as is

Try to create other datasets from the 10k (each time it generates a new one), and compare the output with some valid solution.

Maybe deadlock or can't find a key?

JamalMulla · 2024-01-26T23:57:06Z

@gunnarmorling any ideas? i would really like to get this evaluated. If you can provide more details i'll try to fix but i can't even reproduce it as is

Try to create other datasets from the 10k (each time it generates a new one), and compare the output with some valid solution.

Maybe deadlock or can't find a key?

yeah been trying this. no luck so far. also annoying cause it takes a while to generate 1bil each time

tivrfoa · 2024-01-27T00:03:59Z

@gunnarmorling any ideas? i would really like to get this evaluated. If you can provide more details i'll try to fix but i can't even reproduce it as is

Try to create other datasets from the 10k (each time it generates a new one), and compare the output with some valid solution.
Maybe deadlock or can't find a key?

yeah been trying this. no luck so far. also annoying cause it takes a while to generate 1bil each time

Try 100 and 200 million rows. It's faster to generate

tivrfoa · 2024-01-27T00:05:55Z

@gunnarmorling any ideas? i would really like to get this evaluated. If you can provide more details i'll try to fix but i can't even reproduce it as is

Try to create other datasets from the 10k (each time it generates a new one), and compare the output with some valid solution.
Maybe deadlock or can't find a key?

yeah been trying this. no luck so far. also annoying cause it takes a while to generate 1bil each time

Try these. Ask permission for these files:

https://drive.google.com/file/d/1sVA0yUuo5oZoeAQFoPJgaF0cQnq4fPCr/view?usp=sharing

https://drive.google.com/file/d/1hB72HaFC87Pjykv1mu0CIF4kXO5Qjefs/view?usp=sharing

gunnarmorling · 2024-01-27T14:14:52Z

Getting this error:

Fatal error: Failed to leave the current IsolateThread context and to detach the current thread. (code 12)

I have seen it with another native image entry, but I don't know what the cause is. @thomaswue might have an idea.

JamalMulla · 2024-01-27T22:37:47Z

@gunnarmorling I've removed one of the native build options which may have played a part in the error (not sure though). Could you re-test please. No idea what else to try and would like to see the time for the normal 1brc at least. Thanks!

tivrfoa · 2024-01-28T00:37:39Z

Getting this error:
Fatal error: Failed to leave the current IsolateThread context and to detach the current thread. (code 12)
I have seen it with another native image entry, but I don't know what the cause is. @thomaswue might have an idea.

Do you remember the other code that had the same error?

Maybe there's something similar in their solutions.

Another thing that could have been tried was to use same native flags of one solution that is working.
Also use newest graal version: sdk use java 21.0.2-graal 1>&2

And what makes the bug more interesting, is that @JamalMulla said it was working in his computer. =)

gunnarmorling · 2024-01-28T09:36:23Z

So the isolate error seems to be intermittent, can't reproduce it any longer (also without your latest change). So the remainging issue is that it still doesn't produce the correct output for the 10K key set test:

+ timeout -v 300 ./test.sh JamalMulla measurements_10K_1B.txt

Using java version 21.0.1-graal in this shell.
Validating calculate_average_JamalMulla.sh -- measurements_10K_1B.txt
6482c6482
< noA;-19.7;8.2;39.3
---
> noA;-23.2;8.2;39.3
6492c6492
< noWuhuNanningHar;-23.2;10.4;41.6
---
> noWuhuNanningHar;-19.0;11.6;41.6

FAILURE: ./test.sh JamalMulla measurements_10K_1B.txt failed

JamalMulla · 2024-01-28T11:09:55Z

@gunnarmorling fixed 🤞!

gunnarmorling · 2024-01-28T11:20:39Z

Same diff for the 10K key set test as before :(

JamalMulla · 2024-01-29T21:47:06Z

Same diff for the 10K key set test as before :(

@gunnarmorling last attempt before i give up. please recheck. slower than the other attempts but hopefully correct at least

gunnarmorling · 2024-01-31T21:09:10Z

Yepp, looking good now.

Benchmark 1: timeout -v 300 ./calculate_average_JamalMulla.sh 2>&1
  Time (mean ± σ):      3.124 s ±  0.073 s    [User: 21.819 s, System: 0.702 s]
  Range (min … max):    3.079 s …  3.252 s    5 runs

Summary
  JamalMulla: trimmed mean 3.0958672756066665, raw times 3.07938657994,3.09427596894,3.1120593309399998,3.08126652694,3.25184166094

Leaderboard

| # | Result (m:s.ms) | Implementation     | JDK | Submitter     | Notes     |
|---|-----------------|--------------------|-----|---------------|-----------|
|   | 00:03.095 | [link](https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_JamalMulla.java)| 21.0.2-graal | [Jamal Mulla](https://github.com/JamalMulla) | GraalVM native binary, uses Unsafe |

JamalMulla · 2024-01-31T21:12:36Z

Yepp, looking good now.

Benchmark 1: timeout -v 300 ./calculate_average_JamalMulla.sh 2>&1
  Time (mean ± σ):      3.124 s ±  0.073 s    [User: 21.819 s, System: 0.702 s]
  Range (min … max):    3.079 s …  3.252 s    5 runs

Summary
  JamalMulla: trimmed mean 3.0958672756066665, raw times 3.07938657994,3.09427596894,3.1120593309399998,3.08126652694,3.25184166094

Leaderboard

| # | Result (m:s.ms) | Implementation     | JDK | Submitter     | Notes     |
|---|-----------------|--------------------|-----|---------------|-----------|
|   | 00:03.095 | [link](https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_JamalMulla.java)| 21.0.2-graal | [Jamal Mulla](https://github.com/JamalMulla) | GraalVM native binary, uses Unsafe |

Finally 🎉

gunnarmorling · 2024-02-11T12:34:46Z

Hey @JamalMulla!

Congrats again on being in the Top 20 of the One Billion Row Challenge!

To celebrate this amazing achievement, I would like to send you a 1BRC t-shirt and coffee mug. To claim your prize, fill out this form by Feb 18. After submitting the form, please provide a comment with the random value you've specified in the form, so that I know it is you who submitted it.

All data entered will solely be used in relation to processing this shipment. Shipments can be sent to any country listed here or here (I'll use whichever one is cheaper for me to ship to your location). A big thank you to Decodable for sponsoring these prizes!

Thanks a lot for participating in 1BRC,

--Gunnar

JamalMulla · 2024-02-13T20:07:13Z

Hey @JamalMulla!

Congrats again on being in the Top 20 of the One Billion Row Challenge!

To celebrate this amazing achievement, I would like to send you a 1BRC t-shirt and coffee mug. To claim your prize, fill out this form by Feb 18. After submitting the form, please provide a comment with the random value you've specified in the form, so that I know it is you who submitted it.

All data entered will solely be used in relation to processing this shipment. Shipments can be sent to any country listed here or here (I'll use whichever one is cheaper for me to ship to your location). A big thank you to Decodable for sponsoring these prizes!

Thanks a lot for participating in 1BRC,

--Gunnar

Hey, I've filled in the form. My numbers were 21975 11916. Thanks

JamalMulla and others added 17 commits January 14, 2024 17:56

Initial chunked impl

d3ba8a7

Bytes instead of chars

e564255

Improved number parsing

81802e0

Custom hashmap

78e3c3b

Graal and some tuning

7e40d06

Fix segmenting

31de444

Fix casing

42b34d4

Unsafe

c3b5966

Inlining hash calc

6ab6882

Improved loop

c7ad374

Cleanup

4eb2eda

Speeding up equals

58c33ff

Simplifying hash

756120a

Replace concurrenthashmap with lock

ee2b765

Small changes

db62fc3

Script reorg

3cf1ce1

Native

b52ce52

JamalMulla force-pushed the main branch from f072521 to be40dc5 Compare January 20, 2024 16:36

gunnarmorling added the test failure label Jan 20, 2024

Lots of inlining and improvements

577fc3a

JamalMulla force-pushed the main branch from be40dc5 to 577fc3a Compare January 20, 2024 22:19

JamalMulla and others added 2 commits January 20, 2024 22:21

Merge branch 'gunnarmorling:main' into main

4e3109d

Add back length check

fc897b6

gunnarmorling reviewed Jan 21, 2024

View reviewed changes

gunnarmorling added output differs and removed test failure labels Jan 23, 2024

gunnarmorling added Fails 10K and removed output differs labels Jan 23, 2024

gunnarmorling mentioned this pull request Jan 27, 2024

Some fine tuning for thomaswue #606

Merged

5 tasks

JamalMulla force-pushed the main branch 3 times, most recently from a5676bb to 0524f9d Compare January 27, 2024 23:49

JamalMulla force-pushed the main branch from 0524f9d to 39a3082 Compare January 28, 2024 11:08

Small changes

2d13d02

JamalMulla force-pushed the main branch from 39a3082 to 2d13d02 Compare January 29, 2024 21:46

gunnarmorling merged commit e639e2a into gunnarmorling:main Jan 31, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Second attempt with various improvements #510

Second attempt with various improvements #510

JamalMulla commented Jan 20, 2024

gunnarmorling commented Jan 20, 2024

JamalMulla commented Jan 20, 2024

gunnarmorling left a comment

JamalMulla commented Jan 21, 2024

JamalMulla commented Jan 21, 2024

gunnarmorling commented Jan 21, 2024

gunnarmorling commented Jan 21, 2024

gunnarmorling commented Jan 23, 2024

JamalMulla commented Jan 23, 2024

JamalMulla commented Jan 26, 2024

tivrfoa commented Jan 26, 2024

JamalMulla commented Jan 26, 2024

tivrfoa commented Jan 27, 2024

tivrfoa commented Jan 27, 2024

gunnarmorling commented Jan 27, 2024

JamalMulla commented Jan 27, 2024

tivrfoa commented Jan 28, 2024

gunnarmorling commented Jan 28, 2024

JamalMulla commented Jan 28, 2024

gunnarmorling commented Jan 28, 2024

JamalMulla commented Jan 29, 2024 •

edited

Loading

gunnarmorling commented Jan 31, 2024

JamalMulla commented Jan 31, 2024

gunnarmorling commented Feb 11, 2024

JamalMulla commented Feb 13, 2024

Second attempt with various improvements #510

Second attempt with various improvements #510

Conversation

JamalMulla commented Jan 20, 2024

Check List:

gunnarmorling commented Jan 20, 2024

JamalMulla commented Jan 20, 2024

gunnarmorling left a comment

Choose a reason for hiding this comment

JamalMulla commented Jan 21, 2024

JamalMulla commented Jan 21, 2024

gunnarmorling commented Jan 21, 2024

gunnarmorling commented Jan 21, 2024

gunnarmorling commented Jan 23, 2024

JamalMulla commented Jan 23, 2024

JamalMulla commented Jan 26, 2024

tivrfoa commented Jan 26, 2024

JamalMulla commented Jan 26, 2024

tivrfoa commented Jan 27, 2024

tivrfoa commented Jan 27, 2024

gunnarmorling commented Jan 27, 2024

JamalMulla commented Jan 27, 2024

tivrfoa commented Jan 28, 2024

gunnarmorling commented Jan 28, 2024

JamalMulla commented Jan 28, 2024

gunnarmorling commented Jan 28, 2024

JamalMulla commented Jan 29, 2024 • edited Loading

gunnarmorling commented Jan 31, 2024

JamalMulla commented Jan 31, 2024

gunnarmorling commented Feb 11, 2024

JamalMulla commented Feb 13, 2024

JamalMulla commented Jan 29, 2024 •

edited

Loading