-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: compare dict by value #2085
base: develop
Are you sure you want to change the base?
Conversation
Benchmarks: random_access |
fn compare_by_value( | ||
lhs: &DictArray, | ||
rhs: Scalar, | ||
operator: Operator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not take this argument and name the function compare_eq_by_code
?
// Couldn't find a value match, so the result is all false | ||
let Some(code) = bool.boolean_buffer().set_indices().next() else { | ||
return Ok(Some( | ||
ConstantArray::new(false, lhs.codes().len()).into_array(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to ensure the correct nullability here.
I am thinking of adding a PrimitiveArray with nullability as a sentinel value, any thoughts? |
# Conflicts: # vortex-sampling-compressor/src/downscale.rs
no change Timer precision: 41 ns
dict_compare fastest │ slowest │ median │ mean │ samples │ iters
├─ bench_compare_primitive │ │ │ │ │
│ ├─ (1000000, 0.05) 607.2 µs │ 1.392 ms │ 708.8 µs │ 728.4 µs │ 100 │ 100
│ ├─ (1000000, 5e-5) 355 µs │ 585.5 µs │ 372.5 µs │ 402.1 µs │ 100 │ 100
│ ├─ (10000000, 0.05) 5.818 ms │ 7.566 ms │ 6.06 ms │ 6.221 ms │ 100 │ 100
│ ├─ (10000000, 5e-5) 3.307 ms │ 4.672 ms │ 3.414 ms │ 3.485 ms │ 100 │ 100
│ ├─ (100000000, 0.05) 81.83 ms │ 93.47 ms │ 83.88 ms │ 84.24 ms │ 100 │ 100
│ ╰─ (100000000, 5e-5) 59.24 ms │ 71.39 ms │ 61.45 ms │ 61.97 ms │ 100 │ 100
├─ bench_compare_varbin │ │ │ │ │
│ ├─ (1000000, 0.05) 680.7 µs │ 874.6 µs │ 697.7 µs │ 708.3 µs │ 100 │ 100
│ ├─ (1000000, 5e-5) 355.9 µs │ 547.4 µs │ 384.9 µs │ 402.3 µs │ 100 │ 100
│ ├─ (10000000, 0.05) 6.753 ms │ 7.791 ms │ 6.948 ms │ 7.023 ms │ 100 │ 100
│ ├─ (10000000, 5e-5) 3.305 ms │ 4.293 ms │ 3.354 ms │ 3.38 ms │ 100 │ 100
│ ├─ (100000000, 0.05) 90.67 ms │ 103.8 ms │ 92.96 ms │ 93.36 ms │ 100 │ 100
│ ╰─ (100000000, 5e-5) 59.13 ms │ 70.96 ms │ 61.96 ms │ 62.69 ms │ 100 │ 100
╰─ bench_compare_varbinview │ │ │ │ │
├─ (1000000, 0.05) 682.8 µs │ 888.1 µs │ 779.4 µs │ 767.5 µs │ 100 │ 100
├─ (1000000, 5e-5) 382.1 µs │ 514.6 µs │ 421.4 µs │ 425 µs │ 100 │ 100
├─ (10000000, 0.05) 6.777 ms │ 8.491 ms │ 7.172 ms │ 7.294 ms │ 100 │ 100
├─ (10000000, 5e-5) 3.308 ms │ 4.666 ms │ 3.363 ms │ 3.42 ms │ 100 │ 100
├─ (100000000, 0.05) 90.66 ms │ 103.3 ms │ 92.39 ms │ 93.23 ms │ 100 │ 100
╰─ (100000000, 5e-5) 59.1 ms │ 67.62 ms │ 60.43 ms │ 61.03 ms │ 100 │ 100 |
change Timer precision: 41 ns
dict_compare fastest │ slowest │ median │ mean │ samples │ iters
├─ bench_compare_primitive │ │ │ │ │
│ ├─ (1000000, 0.05) 115.4 µs │ 619.6 µs │ 121 µs │ 127 µs │ 100 │ 100
│ ├─ (1000000, 5e-5) 108.5 µs │ 201.1 µs │ 111.7 µs │ 116.3 µs │ 100 │ 100
│ ├─ (10000000, 0.05) 1.18 ms │ 1.609 ms │ 1.376 ms │ 1.369 ms │ 100 │ 100
│ ├─ (10000000, 5e-5) 1.109 ms │ 1.439 ms │ 1.161 ms │ 1.186 ms │ 100 │ 100
│ ├─ (100000000, 0.05) 12.13 ms │ 16.94 ms │ 12.5 ms │ 12.86 ms │ 100 │ 100
│ ╰─ (100000000, 5e-5) 11.58 ms │ 15 ms │ 11.91 ms │ 12.13 ms │ 100 │ 100
├─ bench_compare_varbin │ │ │ │ │
│ ├─ (1000000, 0.05) 204.8 µs │ 321.1 µs │ 208.5 µs │ 216.2 µs │ 100 │ 100
│ ├─ (1000000, 5e-5) 109.2 µs │ 142.6 µs │ 110.4 µs │ 112.2 µs │ 100 │ 100
│ ├─ (10000000, 0.05) 2.058 ms │ 2.577 ms │ 2.094 ms │ 2.105 ms │ 100 │ 100
│ ├─ (10000000, 5e-5) 1.107 ms │ 1.351 ms │ 1.13 ms │ 1.138 ms │ 100 │ 100
│ ├─ (100000000, 0.05) 21.12 ms │ 25.08 ms │ 21.52 ms │ 21.85 ms │ 100 │ 100
│ ╰─ (100000000, 5e-5) 11.46 ms │ 15.49 ms │ 11.75 ms │ 12.12 ms │ 100 │ 100
╰─ bench_compare_varbinview │ │ │ │ │
├─ (1000000, 0.05) 205.4 µs │ 259.4 µs │ 206.7 µs │ 211.5 µs │ 100 │ 100
├─ (1000000, 5e-5) 109.5 µs │ 144.8 µs │ 111.6 µs │ 112.8 µs │ 100 │ 100
├─ (10000000, 0.05) 2.062 ms │ 3.051 ms │ 2.155 ms │ 2.235 ms │ 100 │ 100
├─ (10000000, 5e-5) 1.095 ms │ 1.404 ms │ 1.158 ms │ 1.167 ms │ 100 │ 100
├─ (100000000, 0.05) 20.98 ms │ 23.36 ms │ 21.37 ms │ 21.45 ms │ 100 │ 100
╰─ (100000000, 5e-5) 11.41 ms │ 12 ms │ 11.5 ms │ 11.52 ms │ 100 │ 100 |
@@ -45,7 +45,7 @@ Use :func:`~vortex.compress` to compress the Vortex array and check the relative | |||
|
|||
>>> cvtx = vx.compress(vtx) | |||
>>> cvtx.nbytes | |||
15768 | |||
15793 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is suspect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like dict encoding is no longer chosen for days, and instead it uses run-end, which honestly is probably better
Benchmarks: TPC-HTable of Results
|
Benchmarks: ClickbenchTable of Results
|
Implement compare(eq) on dict using search on values + compare on codes.