Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: compare dict by value #2085

Open
wants to merge 31 commits into
base: develop
Choose a base branch
from
Open

Conversation

joseph-isaacs
Copy link
Member

@joseph-isaacs joseph-isaacs commented Jan 27, 2025

Implement compare(eq) on dict using search on values + compare on codes.

@joseph-isaacs joseph-isaacs marked this pull request as ready for review January 27, 2025 18:06
@joseph-isaacs joseph-isaacs added wire-break Includes a break to the serialized IPC or file format benchmark Run benchmarks on this branch and removed wire-break Includes a break to the serialized IPC or file format labels Jan 27, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Jan 27, 2025
Copy link
Contributor

github-actions bot commented Jan 27, 2025

Benchmarks: random_access

Table of Results
name PR 2fef9d6 base f97dfc5 ratio (PR/base) unit
random-access/vortex-tokio-local-disk 1.42022e+06 2.67526e+06 0.530871 ns
random-access/vortex-local-fs 2.12229e+06 3.43754e+06 0.617386 ns
random-access/parquet-tokio-local-disk 2.2508e+08 2.2749e+08 0.989405 ns

fn compare_by_value(
lhs: &DictArray,
rhs: Scalar,
operator: Operator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not take this argument and name the function compare_eq_by_code?

// Couldn't find a value match, so the result is all false
let Some(code) = bool.boolean_buffer().set_indices().next() else {
return Ok(Some(
ConstantArray::new(false, lhs.codes().len()).into_array(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to ensure the correct nullability here.

encodings/dict/src/compute/compare.rs Outdated Show resolved Hide resolved
@joseph-isaacs
Copy link
Member Author

I am thinking of adding a PrimitiveArray with nullability as a sentinel value, any thoughts?

@joseph-isaacs joseph-isaacs requested a review from gatesn January 28, 2025 11:44
@joseph-isaacs joseph-isaacs added the benchmark Run benchmarks on this branch label Jan 28, 2025
# Conflicts:
#	vortex-sampling-compressor/src/downscale.rs
@joseph-isaacs joseph-isaacs added benchmark Run benchmarks on this branch and removed benchmark Run benchmarks on this branch labels Jan 28, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Jan 28, 2025
@joseph-isaacs joseph-isaacs changed the title Compare dict by value perf: compare dict by value Jan 28, 2025
@joseph-isaacs joseph-isaacs added the benchmark Run benchmarks on this branch label Jan 28, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Jan 28, 2025
@joseph-isaacs joseph-isaacs added the benchmark Run benchmarks on this branch label Jan 28, 2025
@joseph-isaacs joseph-isaacs added the benchmark Run benchmarks on this branch label Feb 5, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Feb 5, 2025
@joseph-isaacs joseph-isaacs added the benchmark Run benchmarks on this branch label Feb 5, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Feb 5, 2025
encodings/dict/src/array.rs Outdated Show resolved Hide resolved
vortex-array/src/array/bool/compute/cast.rs Outdated Show resolved Hide resolved
vortex-array/src/array/bool/compute/cast.rs Outdated Show resolved Hide resolved
vortex-array/src/array/bool/compute/cast.rs Show resolved Hide resolved
@joseph-isaacs joseph-isaacs requested a review from gatesn February 5, 2025 15:59
@joseph-isaacs joseph-isaacs enabled auto-merge (squash) February 5, 2025 15:59
encodings/dict/src/array.rs Outdated Show resolved Hide resolved
@spiraldb spiraldb deleted a comment from github-actions bot Feb 5, 2025
@spiraldb spiraldb deleted a comment from github-actions bot Feb 5, 2025
@spiraldb spiraldb deleted a comment from github-actions bot Feb 5, 2025
@joseph-isaacs joseph-isaacs requested a review from gatesn February 5, 2025 16:41
@joseph-isaacs
Copy link
Member Author

no change

Timer precision: 41 ns
dict_compare                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ bench_compare_primitive                 │               │               │               │         │
│  ├─ (1000000, 0.05)        607.2 µs      │ 1.392 ms      │ 708.8 µs      │ 728.4 µs      │ 100100
│  ├─ (1000000, 5e-5)        355 µs        │ 585.5 µs      │ 372.5 µs      │ 402.1 µs      │ 100100
│  ├─ (10000000, 0.05)       5.818 ms      │ 7.566 ms      │ 6.06 ms       │ 6.221 ms      │ 100100
│  ├─ (10000000, 5e-5)       3.307 ms      │ 4.672 ms      │ 3.414 ms      │ 3.485 ms      │ 100100
│  ├─ (100000000, 0.05)      81.83 ms      │ 93.47 ms      │ 83.88 ms      │ 84.24 ms      │ 100100
│  ╰─ (100000000, 5e-5)      59.24 ms      │ 71.39 ms      │ 61.45 ms      │ 61.97 ms      │ 100100
├─ bench_compare_varbin                    │               │               │               │         │
│  ├─ (1000000, 0.05)        680.7 µs      │ 874.6 µs      │ 697.7 µs      │ 708.3 µs      │ 100100
│  ├─ (1000000, 5e-5)        355.9 µs      │ 547.4 µs      │ 384.9 µs      │ 402.3 µs      │ 100100
│  ├─ (10000000, 0.05)       6.753 ms      │ 7.791 ms      │ 6.948 ms      │ 7.023 ms      │ 100100
│  ├─ (10000000, 5e-5)       3.305 ms      │ 4.293 ms      │ 3.354 ms      │ 3.38 ms       │ 100100
│  ├─ (100000000, 0.05)      90.67 ms      │ 103.8 ms      │ 92.96 ms      │ 93.36 ms      │ 100100
│  ╰─ (100000000, 5e-5)      59.13 ms      │ 70.96 ms      │ 61.96 ms      │ 62.69 ms      │ 100100
╰─ bench_compare_varbinview                │               │               │               │         │
   ├─ (1000000, 0.05)        682.8 µs      │ 888.1 µs      │ 779.4 µs      │ 767.5 µs      │ 100100
   ├─ (1000000, 5e-5)        382.1 µs      │ 514.6 µs      │ 421.4 µs      │ 425 µs        │ 100100
   ├─ (10000000, 0.05)       6.777 ms      │ 8.491 ms      │ 7.172 ms      │ 7.294 ms      │ 100100
   ├─ (10000000, 5e-5)       3.308 ms      │ 4.666 ms      │ 3.363 ms      │ 3.42 ms       │ 100100
   ├─ (100000000, 0.05)      90.66 ms      │ 103.3 ms      │ 92.39 ms      │ 93.23 ms      │ 100100
   ╰─ (100000000, 5e-5)      59.1 ms       │ 67.62 ms      │ 60.43 ms      │ 61.03 ms      │ 100100

@joseph-isaacs
Copy link
Member Author

joseph-isaacs commented Feb 5, 2025

change

Timer precision: 41 ns
dict_compare                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ bench_compare_primitive                 │               │               │               │         │
│  ├─ (1000000, 0.05)        115.4 µs      │ 619.6 µs      │ 121 µs        │ 127 µs        │ 100100
│  ├─ (1000000, 5e-5)        108.5 µs      │ 201.1 µs      │ 111.7 µs      │ 116.3 µs      │ 100100
│  ├─ (10000000, 0.05)       1.18 ms       │ 1.609 ms      │ 1.376 ms      │ 1.369 ms      │ 100100
│  ├─ (10000000, 5e-5)       1.109 ms      │ 1.439 ms      │ 1.161 ms      │ 1.186 ms      │ 100100
│  ├─ (100000000, 0.05)      12.13 ms      │ 16.94 ms      │ 12.5 ms       │ 12.86 ms      │ 100100
│  ╰─ (100000000, 5e-5)      11.58 ms      │ 15 ms         │ 11.91 ms      │ 12.13 ms      │ 100100
├─ bench_compare_varbin                    │               │               │               │         │
│  ├─ (1000000, 0.05)        204.8 µs      │ 321.1 µs      │ 208.5 µs      │ 216.2 µs      │ 100100
│  ├─ (1000000, 5e-5)        109.2 µs      │ 142.6 µs      │ 110.4 µs      │ 112.2 µs      │ 100100
│  ├─ (10000000, 0.05)       2.058 ms      │ 2.577 ms      │ 2.094 ms      │ 2.105 ms      │ 100100
│  ├─ (10000000, 5e-5)       1.107 ms      │ 1.351 ms      │ 1.13 ms       │ 1.138 ms      │ 100100
│  ├─ (100000000, 0.05)      21.12 ms      │ 25.08 ms      │ 21.52 ms      │ 21.85 ms      │ 100100
│  ╰─ (100000000, 5e-5)      11.46 ms      │ 15.49 ms      │ 11.75 ms      │ 12.12 ms      │ 100100
╰─ bench_compare_varbinview                │               │               │               │         │
   ├─ (1000000, 0.05)        205.4 µs      │ 259.4 µs      │ 206.7 µs      │ 211.5 µs      │ 100100
   ├─ (1000000, 5e-5)        109.5 µs      │ 144.8 µs      │ 111.6 µs      │ 112.8 µs      │ 100100
   ├─ (10000000, 0.05)       2.062 ms      │ 3.051 ms      │ 2.155 ms      │ 2.235 ms      │ 100100
   ├─ (10000000, 5e-5)       1.095 ms      │ 1.404 ms      │ 1.158 ms      │ 1.167 ms      │ 100100
   ├─ (100000000, 0.05)      20.98 ms      │ 23.36 ms      │ 21.37 ms      │ 21.45 ms      │ 100100
   ╰─ (100000000, 5e-5)      11.41 ms      │ 12 ms         │ 11.5 ms       │ 11.52 ms      │ 100100

@joseph-isaacs joseph-isaacs added the benchmark Run benchmarks on this branch label Feb 5, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Feb 5, 2025
@@ -45,7 +45,7 @@ Use :func:`~vortex.compress` to compress the Vortex array and check the relative

>>> cvtx = vx.compress(vtx)
>>> cvtx.nbytes
15768
15793
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suspect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like dict encoding is no longer chosen for days, and instead it uses run-end, which honestly is probably better

Copy link
Contributor

github-actions bot commented Feb 5, 2025

Benchmarks: TPC-H

Table of Results
name PR 2fef9d6 base f97dfc5 ratio (PR/base) unit
tpch_q01/arrow 553969331 5.80851e+08 0.95372 ns
tpch_q01/parquet 769032734 7.76791e+08 0.990012 ns
tpch_q01/vortex-file-compressed 531497017 5.56518e+08 0.955041 ns
tpch_q02/arrow 140615199 1.5061e+08 0.933638 ns
tpch_q02/parquet 176115419 1.85233e+08 0.95078 ns
tpch_q02/vortex-file-compressed 147786843 1.54315e+08 0.957698 ns
tpch_q03/arrow 177010245 1.78783e+08 0.990082 ns
tpch_q03/parquet 375809156 3.70867e+08 1.01332 ns
tpch_q03/vortex-file-compressed 247911221 2.49047e+08 0.995438 ns
tpch_q04/arrow 178607853 1.7979e+08 0.993423 ns
tpch_q04/parquet 216644562 2.22349e+08 0.974347 ns
tpch_q04/vortex-file-compressed 197398140 2.0618e+08 0.957409 ns
tpch_q05/arrow 328222959 3.35543e+08 0.978183 ns
tpch_q05/parquet 502904996 5.32179e+08 0.944993 ns
tpch_q05/vortex-file-compressed 388037096 3.94651e+08 0.983242 ns
tpch_q06/arrow 28249788 2.73173e+07 1.03414 ns
tpch_q06/parquet 148743637 1.51535e+08 0.98158 ns
tpch_q06/vortex-file-compressed 82746634 8.71497e+07 0.949477 ns
tpch_q07/arrow 628241830 6.45129e+08 0.973823 ns
tpch_q07/parquet 768318999 7.68489e+08 0.999779 ns
tpch_q07/vortex-file-compressed 648449934 6.63273e+08 0.977651 ns
tpch_q08/arrow 270574609 2.68524e+08 1.00764 ns
tpch_q08/parquet 540898552 5.61331e+08 0.9636 ns
tpch_q08/vortex-file-compressed 362758479 3.84466e+08 0.943537 ns
tpch_q09/arrow 485743682 5.18156e+08 0.937447 ns
tpch_q09/parquet 781591968 8.06889e+08 0.968649 ns
tpch_q09/vortex-file-compressed 597139872 5.88137e+08 1.01531 ns
tpch_q10/arrow 265772807 2.68644e+08 0.989312 ns
tpch_q10/parquet 506664350 5.11138e+08 0.991248 ns
tpch_q10/vortex-file-compressed 313719261 2.96583e+08 1.05778 ns
tpch_q11/arrow 138915098 1.41699e+08 0.980351 ns
tpch_q11/parquet 147264840 1.59506e+08 0.923256 ns
tpch_q11/vortex-file-compressed 131642179 1.35031e+08 0.974903 ns
tpch_q12/arrow 184153203 1.84637e+08 0.997382 ns
tpch_q12/parquet 322119231 3.29267e+08 0.978291 ns
tpch_q12/vortex-file-compressed 303217569 2.74842e+08 1.10324 ns
tpch_q13/arrow 171779467 1.79395e+08 0.957548 ns
tpch_q13/parquet 310957241 3.20784e+08 0.969366 ns
tpch_q13/vortex-file-compressed 183462395 1.89867e+08 0.966268 ns
tpch_q14/arrow 37351822 4.08559e+07 0.914232 ns
tpch_q14/parquet 230564253 2.29078e+08 1.00649 ns
tpch_q14/vortex-file-compressed 94293153 9.54114e+07 0.98828 ns
tpch_q15/arrow 70225769 7.21224e+07 0.973703 ns
tpch_q15/parquet 319276333 3.27493e+08 0.97491 ns
tpch_q15/vortex-file-compressed 168928325 1.71126e+08 0.987158 ns
tpch_q16/arrow 99090933 1.02034e+08 0.971157 ns
tpch_q16/parquet 114347342 1.1792e+08 0.969702 ns
tpch_q16/vortex-file-compressed 105185240 1.08307e+08 0.971178 ns
tpch_q17/arrow 629545505 6.58428e+08 0.956134 ns
tpch_q17/parquet 670898647 6.80567e+08 0.985793 ns
tpch_q17/vortex-file-compressed 604335590 6.22228e+08 0.971245 ns
tpch_q18/arrow 1125237509 1.11091e+09 1.0129 ns
tpch_q18/parquet 1347063961 1.41186e+09 0.954107 ns
tpch_q18/vortex-file-compressed 1167706493 1.2102e+09 0.964885 ns
tpch_q19/arrow 151277021 1.53242e+08 0.987179 ns
tpch_q19/parquet 413235714 4.17409e+08 0.990001 ns
tpch_q19/vortex-file-compressed 169382365 1.44965e+08 1.16843 ns
tpch_q20/arrow 174735321 1.76499e+08 0.990009 ns
tpch_q20/parquet 314287798 3.18654e+08 0.986298 ns
tpch_q20/vortex-file-compressed 228144802 2.39939e+08 0.950846 ns
tpch_q21/arrow 989472622 1.01758e+09 0.972382 ns
tpch_q21/parquet 1124319025 1.18451e+09 0.949185 ns
tpch_q21/vortex-file-compressed 1026630386 1.01832e+09 1.00816 ns
tpch_q22/arrow 79385767 7.77639e+07 1.02086 ns
tpch_q22/parquet 108555681 1.08201e+08 1.00328 ns
tpch_q22/vortex-file-compressed 83707559 8.45822e+07 0.98966 ns

Copy link
Contributor

github-actions bot commented Feb 5, 2025

Benchmarks: Clickbench

Table of Results
name PR 2fef9d6 base f97dfc5 ratio (PR/base) unit
clickbench_q00/parquet 1851550 2.04049e+06 0.907405 ns
clickbench_q01/parquet 58009728 6.27891e+07 0.923882 ns
clickbench_q02/parquet 115798327 1.19929e+08 0.965558 ns
clickbench_q03/parquet 83072762 9.01462e+07 0.921534 ns
clickbench_q04/parquet 625842213 7.39092e+08 0.846772 ns
clickbench_q05/parquet 811137192 8.91283e+08 0.910078 ns
clickbench_q06/parquet 1948236 2.14572e+06 0.907962 ns
clickbench_q07/parquet 61104574 6.5555e+07 0.932112 ns
clickbench_q08/parquet 721582754 8.14486e+08 0.885937 ns
clickbench_q09/parquet 1022836086 1.16218e+09 0.880102 ns
clickbench_q10/parquet 248877115 2.72301e+08 0.913979 ns
clickbench_q11/parquet 294071758 3.22221e+08 0.91264 ns
clickbench_q12/parquet 827967554 9.09498e+08 0.910356 ns
clickbench_q13/parquet 1063900727 1.13813e+09 0.934779 ns
clickbench_q14/parquet 816103243 9.01811e+08 0.90496 ns
clickbench_q15/parquet 748522019 8.49497e+08 0.881136 ns
clickbench_q16/parquet 1601721827 1.77137e+09 0.904228 ns
clickbench_q17/parquet 1419175980 1.5674e+09 0.905433 ns
clickbench_q18/parquet 2936275757 3.27993e+09 0.895226 ns
clickbench_q19/parquet 64158355 7.00666e+07 0.915677 ns
clickbench_q20/parquet 1181100533 1.2643e+09 0.934191 ns
clickbench_q21/parquet 1402690663 1.50206e+09 0.933847 ns
clickbench_q22/parquet 2423499212 2.47914e+09 0.977557 ns
clickbench_q23/parquet 8191505061 8.81895e+09 0.928853 ns
clickbench_q24/parquet 518288414 5.57257e+08 0.930071 ns
clickbench_q25/parquet 501153130 5.38243e+08 0.931092 ns
clickbench_q26/parquet 585332419 6.1787e+08 0.947339 ns
clickbench_q27/parquet 1576553659 1.73686e+09 0.907706 ns
clickbench_q28/parquet 11677208441 1.14568e+10 1.01924 ns
clickbench_q29/parquet 427995394 4.48961e+08 0.953302 ns
clickbench_q30/parquet 763618643 8.19379e+08 0.931948 ns
clickbench_q31/parquet 799381270 8.61044e+08 0.928386 ns
clickbench_q32/parquet 2615914848 2.97533e+09 0.879201 ns
clickbench_q33/parquet 2756697207 3.13756e+09 0.878612 ns
clickbench_q34/parquet 2742320820 3.04469e+09 0.900689 ns
clickbench_q35/parquet 824825723 9.42326e+08 0.875308 ns
clickbench_q36/parquet 168339337 1.96398e+08 0.857136 ns
clickbench_q37/parquet 84758516 9.44646e+07 0.897252 ns
clickbench_q38/parquet 113454203 1.21158e+08 0.936412 ns
clickbench_q39/parquet 317218059 3.57024e+08 0.888507 ns
clickbench_q40/parquet 49229642 5.60252e+07 0.878705 ns
clickbench_q41/parquet 49144868 5.15085e+07 0.954113 ns
clickbench_q42/parquet 65818781 7.06452e+07 0.931681 ns
clickbench_q00/vortex-file-compressed 1959604 2.26655e+06 0.864575 ns
clickbench_q01/vortex-file-compressed 36221408 4.26137e+07 0.849994 ns
clickbench_q02/vortex-file-compressed 107240028 1.17691e+08 0.911203 ns
clickbench_q03/vortex-file-compressed 88209490 9.70525e+07 0.908884 ns
clickbench_q04/vortex-file-compressed 606981177 7.35574e+08 0.82518 ns
clickbench_q05/vortex-file-compressed 653525960 7.13059e+08 0.91651 ns
clickbench_q06/vortex-file-compressed 2054938 2.46207e+06 0.834637 ns
clickbench_q07/vortex-file-compressed 52558594 5.66582e+07 0.927643 ns
clickbench_q08/vortex-file-compressed 765324204 8.79608e+08 0.870074 ns
clickbench_q09/vortex-file-compressed 949591163 1.04239e+09 0.910973 ns
clickbench_q10/vortex-file-compressed 228497274 2.44147e+08 0.935901 ns
clickbench_q11/vortex-file-compressed 252491998 2.80802e+08 0.89918 ns
clickbench_q12/vortex-file-compressed 529201470 5.9191e+08 0.894058 ns
clickbench_q13/vortex-file-compressed 795604713 8.8821e+08 0.895739 ns
clickbench_q14/vortex-file-compressed 525343787 5.77639e+08 0.909468 ns
clickbench_q15/vortex-file-compressed 751190163 8.7258e+08 0.860884 ns
clickbench_q16/vortex-file-compressed 1413821779 1.61249e+09 0.876795 ns
clickbench_q17/vortex-file-compressed 1328203040 1.49837e+09 0.886435 ns
clickbench_q18/vortex-file-compressed 2923632097 3.10642e+09 0.941158 ns
clickbench_q19/vortex-file-compressed 50109747 5.89826e+07 0.849569 ns
clickbench_q20/vortex-file-compressed 517347988 5.29249e+08 0.977513 ns
clickbench_q21/vortex-file-compressed 724327612 7.50393e+08 0.965265 ns
clickbench_q22/vortex-file-compressed 1789124882 1.85764e+09 0.963119 ns
clickbench_q23/vortex-file-compressed 3729954092 4.37659e+09 0.85225 ns
clickbench_q24/vortex-file-compressed 281912521 2.99023e+08 0.942778 ns
clickbench_q25/vortex-file-compressed 254245805 2.61213e+08 0.973326 ns
clickbench_q26/vortex-file-compressed 354148583 3.83974e+08 0.922325 ns
clickbench_q27/vortex-file-compressed 1302468880 1.33732e+09 0.973937 ns
clickbench_q28/vortex-file-compressed 10911187852 1.06926e+10 1.02044 ns
clickbench_q29/vortex-file-compressed 713210426 7.36697e+08 0.968119 ns
clickbench_q30/vortex-file-compressed 487778870 5.42029e+08 0.899912 ns
clickbench_q31/vortex-file-compressed 538825278 5.89652e+08 0.913802 ns
clickbench_q32/vortex-file-compressed 2740906343 2.99013e+09 0.91665 ns
clickbench_q33/vortex-file-compressed 2243721752 2.4735e+09 0.907105 ns
clickbench_q34/vortex-file-compressed 2256619763 2.45281e+09 0.920016 ns
clickbench_q35/vortex-file-compressed 936692688 1.06877e+09 0.876419 ns
clickbench_q36/vortex-file-compressed 76506377 6.78871e+07 1.12696 ns
clickbench_q37/vortex-file-compressed 65663827 5.97792e+07 1.09844 ns
clickbench_q38/vortex-file-compressed 61851682 5.6531e+07 1.09412 ns
clickbench_q39/vortex-file-compressed 128110903 9.43361e+07 1.35803 ns
clickbench_q40/vortex-file-compressed 41403029 4.2434e+07 0.975704 ns
clickbench_q41/vortex-file-compressed 41144134 4.39132e+07 0.936941 ns
clickbench_q42/vortex-file-compressed 52480454 5.09804e+07 1.02942 ns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wire-break Includes a break to the serialized IPC or file format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants