-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Second attempt with various improvements #510
Conversation
Seeing a test failure now:
|
@gunnarmorling Been fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not passing tests now:
Validating calculate_average_JamalMulla.sh -- src/test/resources/samples/measurements-10000-unique-keys.txt
2325a2326,2331
> id3090;1.0;1.0;1.0
> id3091;1.0;1.0;1.0
> id3092;1.0;1.0;1.0
> id3093;1.0;1.0;1.0
> id3094;1.0;1.0;1.0
> id3095;1.0;1.0;1.0
3430a3437,3442
> id4090;1.0;1.0;1.0
> id4091;1.0;1.0;1.0
> id4092;1.0;1.0;1.0
> id4093;1.0;1.0;1.0
> id4094;1.0;1.0;1.0
> id4095;1.0;1.0;1.0
7868a7881,7886
> id8090;1.0;1.0;1.0
> id8091;1.0;1.0;1.0
> id8092;1.0;1.0;1.0
> id8093;1.0;1.0;1.0
> id8094;1.0;1.0;1.0
> id8095;1.0;1.0;1.0
8973a8992,8993
> id9090;1.0;1.0;1.0
> id9091;1.0;1.0;1.0
FAILURE: ./test.sh JamalMulla failed
Also, where are you dealing with hash collisions?
Very confusing cause it passes locally. What does that output mean? 8973a8992? And here is where I'm handling collisions |
I also don't understand why the Github build doesn't report test failures? seems like an obvious build step |
It's just that we don't have a GH Actions job set up which would run the tests. The reason being that the required SDK would have to be installed. @filiphr had been looking into it, but it is not completed yet. |
Here's the original PR: #106. Any help for driving it over the finishing line will be more than welcome :) |
Tests look good now, but it takes unduly long for the 10K keyset (see create_measurements_3.sh). Could you take a look, I am running the top 15 entries against that keyset, and as yours is up there, it would be nice if it could handle it. Thx. |
Hmm I can't reproduce this behaviour at all. i ran create_measurements3.sh with 1billion and I get this
Is there something else you did for the test? |
@gunnarmorling any ideas? i would really like to get this evaluated. If you can provide more details i'll try to fix but i can't even reproduce it as is |
Try to create other datasets from the 10k (each time it generates a new one), and compare the output with some valid solution. Maybe deadlock or can't find a key? |
yeah been trying this. no luck so far. also annoying cause it takes a while to generate 1bil each time |
Try 100 and 200 million rows. It's faster to generate |
Try these. Ask permission for these files: https://drive.google.com/file/d/1sVA0yUuo5oZoeAQFoPJgaF0cQnq4fPCr/view?usp=sharing https://drive.google.com/file/d/1hB72HaFC87Pjykv1mu0CIF4kXO5Qjefs/view?usp=sharing |
Getting this error:
I have seen it with another native image entry, but I don't know what the cause is. @thomaswue might have an idea. |
@gunnarmorling I've removed one of the native build options which may have played a part in the error (not sure though). Could you re-test please. No idea what else to try and would like to see the time for the normal 1brc at least. Thanks! |
a5676bb
to
0524f9d
Compare
Do you remember the other code that had the same error? Maybe there's something similar in their solutions. Another thing that could have been tried was to use same native flags of one solution that is working. And what makes the bug more interesting, is that @JamalMulla said it was working in his computer. =) |
So the isolate error seems to be intermittent, can't reproduce it any longer (also without your latest change). So the remainging issue is that it still doesn't produce the correct output for the 10K key set test:
|
@gunnarmorling fixed 🤞! |
Same diff for the 10K key set test as before :( |
@gunnarmorling last attempt before i give up. please recheck. slower than the other attempts but hopefully correct at least |
Yepp, looking good now.
|
Finally 🎉 |
Hey @JamalMulla! Congrats again on being in the Top 20 of the One Billion Row Challenge! To celebrate this amazing achievement, I would like to send you a 1BRC t-shirt and coffee mug. To claim your prize, fill out this form by Feb 18. After submitting the form, please provide a comment with the random value you've specified in the form, so that I know it is you who submitted it. All data entered will solely be used in relation to processing this shipment. Shipments can be sent to any country listed here or here (I'll use whichever one is cheaper for me to ship to your location). A big thank you to Decodable for sponsoring these prizes! Thanks a lot for participating in 1BRC, --Gunnar |
Hey, I've filled in the form. My numbers were 21975 11916. Thanks |
Check List:
./test.sh <username>
shows no differences between expected and actual outputs)calculate_average_<username>.sh
(make sure to match casing of your GH user name) and is executablecalculate_average_baseline.sh
Shaved off around 1.5s from my initial submission. Also trying with graal native binary as it seems to give slightly better results and all the top entries are also using it.
Most of the changes are just reducing function calls, inlining the hashing, etc