Skip to content

Commit

Permalink
[SPARK-50216][SQL][TESTS] Update CollationBenchmark to invoke `coll…
Browse files Browse the repository at this point in the history
…ationNameToId` outside of cases

### What changes were proposed in this pull request?
In this PR, UTF8_BINARY performance regression is addressed, that was first identified here apache#48721. The regression is traced back to this PR apache#48222 when it first occurred, however this isn't the actual source of performance degradation.

### Why are the changes needed?
The PR apache#48222 caused the regression because it changed the `collationNameToId` function and made it slightly slower by removing a short-circuit for fetching the UTF8_BINARY collation. However this function should be called fixed amount of times for each query and from the benchmark framework at most once - this was not the case and it was the largest contributor to performance regression.

This PR addresses the benchmarking framework to not call this function at each expression, but once per the test case.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing testing surface, benchmarks.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#48804 from stevomitric/stevomitric/fix-utf8_binary-regression.

Authored-by: Stevo Mitric <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
  • Loading branch information
stevomitric authored and MaxGekk committed Nov 14, 2024
1 parent aea9e87 commit c1968a1
Show file tree
Hide file tree
Showing 5 changed files with 102 additions and 102 deletions.
48 changes: 24 additions & 24 deletions sql/core/benchmarks/CollationBenchmark-jdk21-results.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,53 +2,53 @@ OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - equalsFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
--------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 1349 1349 0 0.1 13485.4 1.0X
UTF8_LCASE 3559 3561 3 0.0 35594.3 2.6X
UNICODE 17580 17589 12 0.0 175803.6 13.0X
UNICODE_CI 17210 17212 2 0.0 172100.2 12.8X
UTF8_BINARY 1353 1357 5 0.1 13532.2 1.0X
UTF8_LCASE 2601 2602 2 0.0 26008.0 1.9X
UNICODE 16745 16756 16 0.0 167450.9 12.4X
UNICODE_CI 16590 16627 52 0.0 165904.8 12.3X

OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - compareFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
---------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 1740 1741 1 0.1 17398.8 1.0X
UTF8_LCASE 2630 2632 3 0.0 26301.0 1.5X
UNICODE 16732 16743 16 0.0 167319.7 9.6X
UNICODE_CI 16482 16492 14 0.0 164819.7 9.5X
UTF8_BINARY 1746 1746 0 0.1 17462.6 1.0X
UTF8_LCASE 2629 2630 1 0.0 26294.8 1.5X
UNICODE 16744 16744 0 0.0 167438.6 9.6X
UNICODE_CI 16518 16521 4 0.0 165180.2 9.5X

OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - hashFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 2808 2808 0 0.0 28082.3 1.0X
UTF8_LCASE 5412 5413 1 0.0 54123.5 1.9X
UNICODE 70755 70787 44 0.0 707553.4 25.2X
UNICODE_CI 57639 57669 43 0.0 576390.0 20.5X
UTF8_BINARY 2808 2808 1 0.0 28076.2 1.0X
UTF8_LCASE 5409 5410 0 0.0 54093.0 1.9X
UNICODE 67930 67957 38 0.0 679296.7 24.2X
UNICODE_CI 56004 56005 1 0.0 560044.2 19.9X

OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - contains: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 9356 9357 0 0.0 93564.9 1.0X
UTF8_LCASE 24106 24129 33 0.0 241055.3 2.6X
UNICODE 368428 369053 883 0.0 3684284.1 39.4X
UNICODE_CI 417361 418242 1246 0.0 4173613.9 44.6X
UTF8_BINARY 1612 1614 2 0.1 16118.8 1.0X
UTF8_LCASE 14509 14526 23 0.0 145092.7 9.0X
UNICODE 308136 308631 700 0.0 3081364.6 191.2X
UNICODE_CI 314612 314846 330 0.0 3146120.0 195.2X

OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - startsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 10941 10943 2 0.0 109411.5 1.0X
UTF8_LCASE 20041 20058 24 0.0 200410.1 1.8X
UNICODE 364296 365610 1859 0.0 3642958.8 33.3X
UNICODE_CI 424306 424888 823 0.0 4243062.7 38.8X
UTF8_BINARY 1913 1914 1 0.1 19131.3 1.0X
UTF8_LCASE 9785 9788 5 0.0 97847.7 5.1X
UNICODE 311517 311580 89 0.0 3115167.2 162.8X
UNICODE_CI 316517 316660 201 0.0 3165173.7 165.4X

OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - endsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 10551 10556 7 0.0 105511.7 1.0X
UTF8_LCASE 20294 20300 9 0.0 202943.7 1.9X
UNICODE 384070 384554 684 0.0 3840704.6 36.4X
UNICODE_CI 441935 442184 352 0.0 4419351.4 41.9X
UTF8_BINARY 1891 1891 0 0.1 18912.1 1.0X
UTF8_LCASE 10089 10093 5 0.0 100893.6 5.3X
UNICODE 336905 336931 36 0.0 3369051.8 178.1X
UNICODE_CI 339944 340585 907 0.0 3399439.0 179.7X

48 changes: 24 additions & 24 deletions sql/core/benchmarks/CollationBenchmark-results.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,53 +2,53 @@ OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - equalsFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
--------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 1372 1372 1 0.1 13715.2 1.0X
UTF8_LCASE 3847 3851 6 0.0 38467.3 2.8X
UNICODE 19659 19662 4 0.0 196587.1 14.3X
UNICODE_CI 19663 19666 3 0.0 196634.5 14.3X
UTF8_BINARY 1373 1373 0 0.1 13730.8 1.0X
UTF8_LCASE 3311 3311 0 0.0 33106.6 2.4X
UNICODE 19067 19100 46 0.0 190672.9 13.9X
UNICODE_CI 18704 18795 129 0.0 187040.2 13.6X

OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - compareFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
---------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 1706 1707 3 0.1 17056.0 1.0X
UTF8_LCASE 4016 4016 0 0.0 40164.0 2.4X
UNICODE 19545 19547 3 0.0 195453.4 11.5X
UNICODE_CI 19544 19547 5 0.0 195437.5 11.5X
UTF8_BINARY 1706 1708 3 0.1 17060.4 1.0X
UTF8_LCASE 3958 3965 10 0.0 39575.4 2.3X
UNICODE 18831 18865 48 0.0 188311.2 11.0X
UNICODE_CI 18818 18825 9 0.0 188181.7 11.0X

OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - hashFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 3091 3092 1 0.0 30909.8 1.0X
UTF8_LCASE 6286 6287 2 0.0 62856.0 2.0X
UNICODE 65495 65528 47 0.0 654945.7 21.2X
UNICODE_CI 59987 59994 10 0.0 599868.6 19.4X
UTF8_BINARY 3092 3093 1 0.0 30918.5 1.0X
UTF8_LCASE 6273 6289 23 0.0 62734.3 2.0X
UNICODE 66953 66962 13 0.0 669525.2 21.7X
UNICODE_CI 53934 53946 17 0.0 539338.7 17.4X

OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - contains: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 13707 13726 27 0.0 137069.4 1.0X
UTF8_LCASE 28660 28685 36 0.0 286598.9 2.1X
UNICODE 363134 364168 1462 0.0 3631341.3 26.5X
UNICODE_CI 412158 412229 100 0.0 4121577.8 30.1X
UTF8_BINARY 1643 1644 1 0.1 16431.2 1.0X
UTF8_LCASE 17241 17273 45 0.0 172411.1 10.5X
UNICODE 304878 307207 3294 0.0 3048780.8 185.5X
UNICODE_CI 317341 320620 4637 0.0 3173412.3 193.1X

OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - startsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 12200 12205 8 0.0 121998.8 1.0X
UTF8_LCASE 27626 27633 9 0.0 276263.6 2.3X
UNICODE 350755 351083 464 0.0 3507553.8 28.8X
UNICODE_CI 409383 410380 1410 0.0 4093834.8 33.6X
UTF8_BINARY 1973 1977 6 0.1 19726.2 1.0X
UTF8_LCASE 17070 17119 70 0.0 170697.7 8.7X
UNICODE 306091 306797 999 0.0 3060911.4 155.2X
UNICODE_CI 306558 307812 1774 0.0 3065581.4 155.4X

OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Linux 6.5.0-1025-azure
AMD EPYC 7763 64-Core Processor
collation unit benchmarks - endsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative time
------------------------------------------------------------------------------------------------------------------------
UTF8_BINARY 11879 11887 12 0.0 118786.3 1.0X
UTF8_LCASE 27743 27759 22 0.0 277434.4 2.3X
UNICODE 368435 368478 61 0.0 3684351.2 31.0X
UNICODE_CI 426350 426503 216 0.0 4263497.6 35.9X
UTF8_BINARY 2064 2064 0 0.0 20640.6 1.0X
UTF8_LCASE 16883 16899 23 0.0 168829.3 8.2X
UNICODE 309882 310702 1160 0.0 3098819.7 150.1X
UNICODE_CI 313599 314798 1695 0.0 3135994.6 151.9X

Loading

0 comments on commit c1968a1

Please sign in to comment.