perf: Update RewriteJoin logic to choose optimal build side #1424

andygrove · 2025-02-19T23:10:17Z

Which issue does this PR close?

Related to #1382

Rationale for this change

The main goal of this PR is to try and choose smaller side of join for build-side.

With this PR, many queries are now faster when compared to the 0.6.0 release.

Total time for TPC-H is 285 seconds, down from 330 seconds.

Query 9 Before

Query 9 After

What changes are included in this PR?

Update RewreiteJoin to match latest version from Apache Gluten

How are these changes tested?

codecov-commenter · 2025-02-20T00:16:06Z

Codecov Report

Attention: Patch coverage is 0% with 26 lines in your changes missing coverage. Please review.

Project coverage is 57.78%. Comparing base (f09f8af) to head (5ab8932).
Report is 42 commits behind head on main.

Files with missing lines	Patch %	Lines
...ain/scala/org/apache/comet/rules/RewriteJoin.scala	0.00%	26 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1424      +/-   ##
============================================
+ Coverage     56.12%   57.78%   +1.65%     
- Complexity      976      986      +10     
============================================
  Files           119      122       +3     
  Lines         11743    12132     +389     
  Branches       2251     2282      +31     
============================================
+ Hits           6591     7010     +419     
+ Misses         4012     3954      -58     
- Partials       1140     1168      +28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mbutrovich · 2025-02-20T15:36:55Z

spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala

+    val leftRowCount = join.left.stats.rowCount
+    val rightRowCount = join.right.stats.rowCount
+    if (leftSize == rightSize && rightRowCount.isDefined && leftRowCount.isDefined) {
+      if (rightRowCount.get <= leftRowCount.get) {


Do we have any sort of cardinality stats (possibly with some catalog implementations)? I'm envisioning a scenario where the right table has fewer rows but much higher cardinality so the resulting hash table is bigger.

Maybe a follow-up issue makes sense to add more heuristics (if we can even get them) to this choice.

That is really awesome to find correct join order, are we sure we can rely on stats for any query?

With AQE enabled, we have rowCount and sizeInBytes available for completed query stages. We could extend the logic here to also look at LogicalRelation.sizeInBytes if the AQE stats are not available (this could be the case for the first join in a query).

@mbutrovich also suggested that we take into account the size of the data that will go into the build-side hash table i.e. consider which columns are uses in the join.

I think we can experiment more with this. I will file an issue to track this.

I filed #1430

Do we have any sort of cardinality stats (possibly with some catalog implementations)?

With cbo enabled, we might have cardinality information. Column stats require a catalog to keep the metadata and stats so with just plain Parquet files, we may not have the column stats information. With iceberg, there is some work that has been done to incorporate column stats into the iceberg table but I'm not sure where the effort to use them in Spark is.
Ref: https://github.com/apache/spark/blob/f37be893d01884461ac515c8b197fb30d9ba68ff/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala#L101

I'm envisioning a scenario where the right table has fewer rows but much higher cardinality so the resulting hash table is bigger.

The hash table might have more keys but total size is still a very good metric simply because a larger size hash table might not fit into memory. Also, depending on the implementation, the larger the size of the data in the hash table, the less cache friendly it might be and we may end up with slower performance.
For reference, Spark uses https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java as the fallback hash table implementation (afaik).

andygrove · 2025-02-20T15:49:51Z

Moving to draft while I test with TPC-DS

andygrove · 2025-02-20T18:51:01Z

I ran TPC-DS and see no difference in performance, so marking this as ready for review

comphead · 2025-02-20T19:18:12Z

spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala

+    if (!leftBuildable && !rightBuildable) {
+      return None
    }
+    if (!leftBuildable) {
+      return Some(BuildRight)
+    }
+    if (!rightBuildable) {
+      return Some(BuildLeft)
+    }


Suggested change

if (!leftBuildable && !rightBuildable) {

return None

}

if (!leftBuildable) {

return Some(BuildRight)

}

if (!rightBuildable) {

return Some(BuildLeft)

}

(leftBuildable, rightBuildable) match {

case (false, false) => return None

case (false, true) => return Some(BuildRight)

case (true, false) => return Some(BuildLeft)

case _ => {

// all other stuff

}

}

?

comphead · 2025-02-20T19:20:38Z

spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala

+    val rightSize = join.right.stats.sizeInBytes
+    val leftRowCount = join.left.stats.rowCount
+    val rightRowCount = join.right.stats.rowCount
+    if (leftSize == rightSize && rightRowCount.isDefined && leftRowCount.isDefined) {


why checking the sizes matters? 🤔 shouldn't be it enough to use rowcounts?

rowCount is an Option so may not always be available. Row count is only used here as a tie-breaker if the left and right size are the same

We can also revisit this logic as part of #1430

maybe I missing something? leftSize == rightSize condition looks very unlikely so by the logic it would hardly consider rowCounts here and there fallback to sizes comparison.

We can perhaps use something like https://docs.pingcap.com/tidb/stable/join-reorder#example-the-greedy-algorithm-of-join-reorder

and if rowCounts are not available fallback to sizes

I suppose it depends on the goal for choosing the build-side. Do we want to limit the amount of data that needs to be loaded into the hash map or limit the number of rows?

The article you linked is for reordering nested joins, so row count would likely be more critical for that scenario.

oh this is for build side, not the entire join reorder, that what I was missing. For hash join build side it is a rule of thumb of having smaller table for build side to fit in memory, but what means smaller here I dont have a strong opinion. Perhaps if we talk about memory then sizeInBytes makes more sense.

comphead

lgtm thanks @andygrove

parthchandra · 2025-02-21T01:35:35Z

Do we know why Spark's decision is so bad to start with? Spark has the same logic here: https://github.com/apache/spark/blob/fb17856a22be6968b2ed55ccbd7cf72111920bea/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L506

andygrove · 2025-02-21T01:48:29Z

Do we know why Spark's decision is so bad to start with? Spark has the same logic here: https://github.com/apache/spark/blob/fb17856a22be6968b2ed55ccbd7cf72111920bea/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L506

Spark is building a SortMergeJoin and we are replacing with ShuffledHashJoin. Our new logic in this PR seems to match the Spark logic you linked to.

parthchandra · 2025-02-21T02:50:15Z

Spark is building a SortMergeJoin and we are replacing with ShuffledHashJoin.

Makes sense

hayman42 · 2025-02-21T04:48:33Z

@andygrove Thanks for opening this PR! I have one questions though.

I also tried to apply the same build side selection logic but found that with multi executors, the CometExchange that is executed right after CometHashJoin with BuildLeft gets slower as described here #1382 (comment) and that is why I did not open PR. Is it confirmed that queries are faster with multi executors as well?

andygrove · 2025-02-21T14:44:43Z

@andygrove Thanks for opening this PR! I have one questions though.

I also tried to apply the same build side selection logic but found that with multi executors, the CometExchange that is executed right after CometHashJoin with BuildLeft gets slower as described here #1382 (comment) and that is why I did not open PR. Is it confirmed that queries are faster with multi executors as well?

I will test with multiple executors today, just to be sure, but I suspect the issue you were seeing is related to spilling in shuffle. I commented on the issue.

I will share the results here later today for multi-executor testing.

kazuyukitanimura · 2025-02-21T18:26:34Z

spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala

          // TODO this was added as a workaround for TPC-DS q14 hanging and needs
          // further investigation


Not a blocker but just wondering if this this because we were choosing wrong side?

This is needed both before and after this PR. I think it may be related to excessive spilling and shuffle but it still needs to be investigated.

kazuyukitanimura · 2025-02-21T18:33:01Z

spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala

+        // If smj has no logical link, or its logical link is not a join,
+        // then we always choose left as build side.
+        BuildLeft


We previously preferring right as

if (canBuildShuffledHashJoinRight(joinType)) { Some(BuildRight) } else if (canBuildShuffledHashJoinLeft(joinType)) { Some(BuildLeft) } else { None }

Is this a behavior change?

Yes, this is a behavior change.

Well, query results won't change, just performance characteristics.

andygrove · 2025-02-21T22:21:35Z

Thanks for the reviews @comphead @parthchandra @kazuyukitanimura @mbutrovich @hayman42

I ran benchmarks with 1 executor w/ 8 cores vs 2 executors w/ 4 cores and saw no difference in performance, so I will go ahead and merge this PR.

I do see issues if shuffle has to spill, and I have ideas on how we can greatly improve this.

andygrove added 2 commits February 19, 2025 15:57

update RewriteJoin logic

7b1a6ba

remove sort

9c65ff1

andygrove mentioned this pull request Feb 20, 2025

CometHashJoin always selects BuildRight which causes potential performance regression #1382

Open

andygrove changed the title ~~[wip] Update RewriteJoin logic~~ perf: Update RewriteJoin logic to favor BuildLeft Feb 20, 2025

andygrove marked this pull request as ready for review February 20, 2025 15:06

mbutrovich reviewed Feb 20, 2025

View reviewed changes

andygrove marked this pull request as draft February 20, 2025 15:44

andygrove changed the title ~~perf: Update RewriteJoin logic to favor BuildLeft~~ perf: Update RewriteJoin logic to choose optimal build side Feb 20, 2025

andygrove added 2 commits February 20, 2025 09:25

reinstate workaround

8755545

reinstate workaround

5ab8932

andygrove marked this pull request as ready for review February 20, 2025 18:50

andygrove mentioned this pull request Feb 20, 2025

Improve RewriteJoin logic to calculate hash table size #1430

Open

comphead reviewed Feb 20, 2025

View reviewed changes

comphead approved these changes Feb 20, 2025

View reviewed changes

andygrove requested a review from kazuyukitanimura February 20, 2025 23:27

kazuyukitanimura approved these changes Feb 21, 2025

View reviewed changes

andygrove marked this pull request as draft February 21, 2025 18:49

andygrove marked this pull request as ready for review February 21, 2025 20:21

andygrove merged commit ef33052 into apache:main Feb 21, 2025
75 checks passed

andygrove deleted the rewrite-join branch February 21, 2025 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Update RewriteJoin logic to choose optimal build side #1424

perf: Update RewriteJoin logic to choose optimal build side #1424

andygrove commented Feb 19, 2025 •

edited

Loading

codecov-commenter commented Feb 20, 2025 •

edited

Loading

mbutrovich Feb 20, 2025 •

edited

Loading

comphead Feb 20, 2025

andygrove Feb 20, 2025

andygrove Feb 20, 2025

parthchandra Feb 21, 2025 •

edited

Loading

andygrove commented Feb 20, 2025

andygrove commented Feb 20, 2025

comphead Feb 20, 2025 •

edited

Loading

comphead Feb 20, 2025

andygrove Feb 20, 2025

andygrove Feb 20, 2025

comphead Feb 20, 2025 •

edited

Loading

andygrove Feb 20, 2025

comphead Feb 20, 2025 •

edited

Loading

comphead left a comment

parthchandra commented Feb 21, 2025

andygrove commented Feb 21, 2025

parthchandra commented Feb 21, 2025

hayman42 commented Feb 21, 2025

andygrove commented Feb 21, 2025

kazuyukitanimura Feb 21, 2025

andygrove Feb 21, 2025

kazuyukitanimura Feb 21, 2025

andygrove Feb 21, 2025

andygrove Feb 21, 2025

andygrove commented Feb 21, 2025

		// TODO this was added as a workaround for TPC-DS q14 hanging and needs
		// further investigation

perf: Update RewriteJoin logic to choose optimal build side #1424

perf: Update RewriteJoin logic to choose optimal build side #1424

Conversation

andygrove commented Feb 19, 2025 • edited Loading

Which issue does this PR close?

Rationale for this change

Query 9 Before

Query 9 After

What changes are included in this PR?

How are these changes tested?

codecov-commenter commented Feb 20, 2025 • edited Loading

Codecov Report

mbutrovich Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parthchandra Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

andygrove commented Feb 20, 2025

andygrove commented Feb 20, 2025

comphead Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

comphead Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

comphead Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

comphead left a comment

Choose a reason for hiding this comment

parthchandra commented Feb 21, 2025

andygrove commented Feb 21, 2025

parthchandra commented Feb 21, 2025

hayman42 commented Feb 21, 2025

andygrove commented Feb 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andygrove commented Feb 21, 2025

andygrove commented Feb 19, 2025 •

edited

Loading

codecov-commenter commented Feb 20, 2025 •

edited

Loading

mbutrovich Feb 20, 2025 •

edited

Loading

parthchandra Feb 21, 2025 •

edited

Loading

comphead Feb 20, 2025 •

edited

Loading

comphead Feb 20, 2025 •

edited

Loading

comphead Feb 20, 2025 •

edited

Loading