Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize distinct inner join to use set find instead of retrieve #17278

Draft
wants to merge 5 commits into
base: branch-24.12
Choose a base branch
from

Conversation

PointKernel
Copy link
Member

Description

This PR introduces a minor optimization for distinct inner joins by using the find results to selectively copy matches to the output. This approach eliminates the need for the costly retrieve operation, which relies on expensive atomic operations.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@PointKernel PointKernel added libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 8, 2024
@PointKernel PointKernel self-assigned this Nov 8, 2024
@PointKernel
Copy link
Member Author

Performance comparison with RTX8000

# distinct_inner_join

## [0] Quadro RTX 8000

|  Key  |  Nullable  |  left_size  |  right_size  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|------------|-------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |     0      |    1000     |     1000     |  54.068 us |      10.72% |  56.935 us |       8.24% |     2.867 us |   5.30% |   PASS   |
|  I32  |     0      |   100000    |     1000     |  63.800 us |       1.40% |  62.924 us |       2.12% |    -0.875 us |  -1.37% |   PASS   |
|  I32  |     0      |  10000000   |     1000     |   1.264 ms |       1.15% |   1.021 ms |       0.84% |  -242.908 us | -19.22% |   FAIL   |
|  I32  |     0      |   100000    |    100000    |  72.797 us |       1.68% |  72.148 us |       1.23% |    -0.649 us |  -0.89% |   PASS   |
|  I32  |     0      |  10000000   |    100000    |   1.177 ms |       0.94% | 988.159 us |       1.04% |  -188.749 us | -16.04% |   FAIL   |
|  I32  |     0      |  10000000   |   10000000   |  13.695 ms |       0.09% |  12.629 ms |       0.09% | -1066.347 us |  -7.79% |   FAIL   |
|  I32  |     1      |    1000     |     1000     |  57.981 us |       1.93% |  60.755 us |       1.78% |     2.774 us |   4.78% |   FAIL   |
|  I32  |     1      |   100000    |     1000     |  67.446 us |       5.73% |  69.453 us |      19.22% |     2.008 us |   2.98% |   PASS   |
|  I32  |     1      |  10000000   |     1000     | 595.085 us |       0.71% | 466.375 us |       0.37% |  -128.710 us | -21.63% |   FAIL   |
|  I32  |     1      |   100000    |    100000    |  70.074 us |       1.94% |  73.968 us |       7.78% |     3.893 us |   5.56% |   FAIL   |
|  I32  |     1      |  10000000   |    100000    | 658.062 us |       1.84% | 571.187 us |       0.63% |   -86.876 us | -13.20% |   FAIL   |
|  I32  |     1      |  10000000   |   10000000   |   3.172 ms |       0.18% |   3.202 ms |       0.13% |    29.974 us |   0.95% |   FAIL   |
|  I64  |     0      |    1000     |     1000     |  47.797 us |       2.97% |  57.524 us |      21.72% |     9.726 us |  20.35% |   FAIL   |
|  I64  |     0      |   100000    |     1000     |  63.696 us |       1.39% |  66.554 us |      17.85% |     2.858 us |   4.49% |   FAIL   |
|  I64  |     0      |  10000000   |     1000     |   1.340 ms |       0.94% |   1.063 ms |       0.92% |  -277.467 us | -20.70% |   FAIL   |
|  I64  |     0      |   100000    |    100000    |  73.252 us |       1.24% |  73.468 us |       1.71% |     0.216 us |   0.29% |   PASS   |
|  I64  |     0      |  10000000   |    100000    |   1.279 ms |       1.10% |   1.019 ms |       1.40% |  -260.429 us | -20.36% |   FAIL   |
|  I64  |     0      |  10000000   |   10000000   |  13.792 ms |       0.14% |  12.713 ms |       0.19% | -1078.653 us |  -7.82% |   FAIL   |
|  I64  |     1      |    1000     |     1000     |  60.127 us |      16.06% |  64.713 us |      24.12% |     4.587 us |   7.63% |   PASS   |
|  I64  |     1      |   100000    |     1000     |  67.320 us |       1.64% |  66.628 us |       2.15% |    -0.692 us |  -1.03% |   PASS   |
|  I64  |     1      |  10000000   |     1000     | 621.816 us |       0.77% | 496.121 us |       3.41% |  -125.695 us | -20.21% |   FAIL   |
|  I64  |     1      |   100000    |    100000    |  70.121 us |       1.60% |  76.659 us |      13.20% |     6.538 us |   9.32% |   FAIL   |
|  I64  |     1      |  10000000   |    100000    | 675.382 us |       0.65% | 583.454 us |       0.44% |   -91.928 us | -13.61% |   FAIL   |
|  I64  |     1      |  10000000   |   10000000   |   3.234 ms |       0.15% |   3.235 ms |       0.15% |     1.249 us |   0.04% |   PASS   |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant