Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify project when we merge two projects #17

Open
wants to merge 3 commits into
base: DrillCalcite1.15.0
Choose a base branch
from

Conversation

chunhui-shi
Copy link

No description provided.

@vvysotskyi
Copy link
Member

vvysotskyi commented Mar 9, 2018

@chunhui-shi, looks like Calcite uses HepPlanner for this rule, but Drill uses VolcanoPlanner. I think we should also use HepPlanner to avoid stack overflow instead of changing the rule.

I have just made a quick change in DefaultSqlHandler.convertToRawDrel() (used HepPlanner for this rule before PlanningPhase.LOGICAL phase) to ensure that it helps in this case, but I'm not sure that it wouldn't break anything.

@chunhui-shi
Copy link
Author

Moving the rule to another stage might help breaking the loop of PushFilterPastProject, PushProjectPastFilter triggered by MergeProject. I think it will be better to put this rule after LOGICAL stage to have better chance to do the optimization. I will be running some more tests to verify this idea.

@chunhui-shi
Copy link
Author

chunhui-shi commented Mar 19, 2018

I saw many unit tests were broken when I tried your suggestion although the hanged case this fix supposed to solve passed. Have you tried some tests with your change? I think this rule could be important to optimize some intermediate results generated by other rules.

@vvysotskyi
Copy link
Member

Can we split rules from the PlanningPhase.LOGICAL phase into the rules which should be run with HepPlanner (for example ProjectMergeRule and rules which may create projects or swap projects with other rel nodes) and other rules which still be run with the Volcano planner?
But anyway if we will decide to do the fix in the Calcite, it is better to do it in the Apache than add one more specific commit into our fork.

@amansinha100
Copy link

What would it take to get this fix into Apache Calcite ? We would need unit tests for the nested CAST functions. @chunhui-shi you mentioned that Calcite has a similar simplification (not sure if it does the exact thing).. can you provide the link to that piece of code ?

@vvysotskyi
Copy link
Member

@amansinha100, I also saw similar simplifications in several rules, for example here: https://github.com/apache/calcite/blob/b60b67eb8f62463ccbc230358969ef2450cdbe05/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L122
In this case, only outer cast is removed.

@chunhui-shi
Copy link
Author

chunhui-shi commented Mar 21, 2018 via email

@amansinha100
Copy link

@vvysotskyi yes Calcite does similar simplifications, but its not clear why the ProjectMergeRule does not do that for CAST.
Since we are constrained by time for the MEP 5.0 release, we have 3 choices:

  1. Put this fix in DrillCalcite forked version
  2. Modify DrillMergeProjectRule (derived class of ProjectMergeRule) and add this fix there but @chunhui-shi mentioned we would also need to replicate a bunch of code from the ProjectMergeRule.
  3. Considering that few people write expressions such as CAST(CAST as INT) as INT , we could ask QA to modify the test cases such that it does not generate such expressions.

@vvysotskyi
Copy link
Member

I tried to reproduce it in Calcite, I simplified the query to use only values tuples:

select t1.f from
(select cast(f as int) f from (values('1')) t(f)) as t 
inner join (select cast(f as int) f from (values('1')) t(f)) as t1
on cast(t.f as int) = cast(t1.f as int) and cast(t.f as int) = 1

But it was not reproduced.
For this query, Calcite creates almost the same plan as Drill, but without one of the Project rel nodes, so I have created exactly the same set of rel nodes, which were used in Drill using RelBuilder, but it also didn't help to reproduce this issue.

In Drill, I have limited a list of rules, with which bug still reproduced (for the query above):

  • DrillFilterJoinRules.DRILL_FILTER_ON_JOIN
  • DrillFilterJoinRules.DRILL_JOIN
  • DrillPushFilterPastProjectRule.INSTANCE (Calcite has similar FilterProjectTransposeRule)
  • DrillPushProjectPastFilterRule.INSTANCE (inheritor of ProjectFilterTransposeRule)
  • DrillMergeProjectRule (inheritor of ProjectMergeRule)
  • DrillFilterRule.INSTANCE
  • DrillProjectRule.INSTANCE
  • DrillJoinRule.INSTANCE
  • DrillValuesRule.INSTANCE

The last four rules are used for converting. I tried to use all these rules in Calcite unit test with the same Drill plan and replaced Drill converting rules by Enumerable rules. It also didn't help.

With the specified set of rules, I noticed the interesting thing: one of LogicalProject rel nodes, which is created alter applying DrillMergeProjectRule has the same RelSubset as input, where it will be added, so for this case, DrillMergeProjectRule will be matched infinitely.
I believe that the root cause of the original issue is the same as the root cause of the issue which I describe.

@amansinha100
Copy link

@vvysotskyi thanks for trying the repro within Calcite and comparing with Drill. Since you had mentioned earlier that Calcite does the ProjectMergeRule as part of HepPlanner while Drill does it with Volcano, chances are there may be a costing change needed. Can you try modifying DrillProjectRelBase.computeSelfCost() ? Currently, it just checks for the number of expressions. It does not account for the complexity of that expression. We should modify it such that it treat CAST(CAST) as more expensive than just one CAST.

@vvysotskyi
Copy link
Member

@amansinha100, Calcite also able to use this rule with VolcanoPlanner. With all my tries to reproduce this issue in Calcite, I used VolcanoPlanner.

The idea of modifying the cost depending on the complexity of expressions looks interesting, we should try to implement it and see if it will help.

@chunhui-shi
Copy link
Author

chunhui-shi commented Mar 21, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants