New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add `Container` trait and to simplify `Expr` and `LogicalPlan` apply and map methods #13467

Open

peter-toth wants to merge 2 commits into apache:main from peter-toth:containers

+676 −595

Contributor

peter-toth commented Nov 18, 2024 •

edited

Loading

Which issue does this PR close?

Part of #8913.

Rationale for this change

The current implementation of LogicalPlan:apply_children(), LogicalPlan::map_children(), LogicalPlan::apply_expressions(), LogicalPlan::map_expressions(), Expr::apply_children() and Expr::map_children() are confusing due the map_until_stop_and_collect macro. I think we can introduce a trait that can contain arbitrary sibling elements that functions can be applied on and mapped:

/// [`Container`] contains elements that a function can be applied on or mapped. The
/// elements of the container are siblings so the continuation rules are similar to
/// [`TreeNodeRecursion::visit_sibling`] / [`Transformed::transform_sibling`].
pub trait Container<'a, T: 'a>: Sized {
    fn apply_elements<F: FnMut(&'a T) -> Result<TreeNodeRecursion>>(
        &'a self,
        f: F,
    ) -> Result<TreeNodeRecursion>;

    fn map_elements<F: FnMut(T) -> Result<Transformed<T>>>(
        self,
        f: F,
    ) -> Result<Transformed<Self>>;
}

What changes are included in this PR?

This PR:

Gets rid of map_until_stop_and_collect macro and many transform and rewrite helper methods.
Adds the Container trait and blanket implementations for Box, Option, Vec, tuples, ...
Defined the Container implementation for Expr and LogicalPlan.
Simplifies the above mentioned apply... and map... methods.

Are these changes tested?

Yes, with exitsing UTs.

Are there any user-facing changes?

No.


          Add Container trait and its blanket implementations, remove `map_un…

2036c09

…til_stop_and_collect` macro, simplify apply and map logic with `Container`s where possible

github-actions bot added sql logical-expr optimizer common labels

Member

findepi commented Nov 18, 2024

This PR:

Gets rid of map_until_stop_and_collect macro

that's great

Adds the Container trait and blanket implementations for Box, Option, Vec, tuples, ...

Do we need that?

What about calling this trait TreeNode or GraphNode and implementing it for our types only?


          fix clippy

166e5e5

Contributor Author

peter-toth commented Nov 18, 2024

Adds the Container trait and blanket implementations for Box, Option, Vec, tuples, ...

Do we need that?

What about calling this trait TreeNode or GraphNode and implementing it for our types only?

Yes, we do. We already have the TreeNode trait with a well defined API. It has TreeNode::apply_children() / TreeNode::map_children() to let the tree implementations define how to visit/map the children of a node.

This new Container trait with the above mentioned blankets just make the implementation of that apply_children() / map_children() of logical plan trees (Expr and LogicalPlan) simpler.
We can actually move Container and its blanket implementations to datafusion::expr if that's cleaner.

Contributor Author

peter-toth commented Nov 18, 2024 •

edited

Loading

cc @alamb, @berkaysynnada

alamb reviewed

View reviewed changes

Contributor

alamb left a comment

Thank you @peter-toth - I think this looks like a really nice improvement in the code 🙏

I have some comment suggestions, but nothing that is required.

The only thing I want to do prior to approving this PR is to run the planning benchmarks and make sure we didn't introduce any regressions. I am running them now.

datafusion/common/src/tree_node.rs

+                      f: F,
+                  ) -> Result<TreeNodeRecursion>;
+                  fn map_elements<F: FnMut(T) -> Result<Transformed<T>>>(

Contributor

alamb Nov 18, 2024

Could we also add some documentation here about what map_elements is (perhaps just a map back to TreeNode::map_children?)

datafusion/common/src/tree_node.rs

+              /// elements of the container are siblings so the continuation rules are similar to
+              /// [`TreeNodeRecursion::visit_sibling`] / [`Transformed::transform_sibling`].
+              pub trait Container<'a, T: 'a>: Sized {
+                  fn apply_elements<F: FnMut(&'a T) -> Result<TreeNodeRecursion>>(

Contributor

alamb Nov 18, 2024

Could we also add some documentation here about what apply_elements is (perhaps just a map back to TreeNode::apply?)

datafusion/common/src/tree_node.rs

+                      let mut tnr = TreeNodeRecursion::Continue;
+                      for c in self {
+                          tnr = c.apply_elements(&mut f)?;
+                          match tnr {

Contributor

alamb Nov 18, 2024

this is definitely a much easier to understand formulation

datafusion/common/src/tree_node.rs

+                  }
+              }
+              impl<'a, T: 'a, C0: Container<'a, T>, C1: Container<'a, T>> Container<'a, T>

Contributor

alamb Nov 18, 2024

It took me a moment to realize this was the impl for (a,b) 2-tuples. Maybe we could make that clearer with some comments.

Likewise with the 3-tuple.

Though to be honest, it seems to me like it might be clearer if we didn't implement this trait for tuples, and instead made the places they are used, named structs. But that is just a personal style preference

datafusion/common/src/tree_node.rs

+              /// [`Container`] contains elements that a function can be applied on or mapped. The
+              /// elements of the container are siblings so the continuation rules are similar to
+              /// [`TreeNodeRecursion::visit_sibling`] / [`Transformed::transform_sibling`].
+              pub trait Container<'a, T: 'a>: Sized {

Contributor

alamb Nov 18, 2024

I think it is a matter of preference, but what would you think about calling this something less generic, such as TreeNodeContainer ?

datafusion/expr/src/logical_plan/ddl.rs

+                      self.function_body.apply_elements(f)
+                  }
+                  fn map_elements<F: FnMut(Expr) -> Result<Transformed<Expr>>>(

Contributor

alamb Nov 18, 2024

this is very cool

datafusion/expr/src/logical_plan/tree_node.rs

    
            @@ -81,7 +79,7 @@ impl TreeNode for LogicalPlan {
          
                              expr,

                              input,

                              schema,

                          }) => rewrite_arc(input, f)?.update_data(|input| {

                          }) => input.map_elements(f)?.update_data(|input| {

Contributor

alamb Nov 18, 2024

Is the difference between map_children and map_elements that map_elements doesn't also apply the function to input ?

datafusion/expr/src/logical_plan/tree_node.rs

                           // There are two part of expression for join, equijoin(on) and non-equijoin(filter).
                           // 1. the first part is `on.len()` equijoin expressions, and the struct of each expr is `left-on = right-on`.
                           // 2. the second part is non-equijoin(filter).
                           LogicalPlan::Join(Join { on, filter, .. }) => {
-                              on.iter()
-                                  // TODO: why we need to create an `Expr::eq`? Cloning `Expr` is costly...

Contributor

alamb Nov 18, 2024

🎉 for removing the clone

datafusion/expr/src/tree_node.rs

    
            @@ -57,78 +57,50 @@ impl TreeNode for Expr {
          
                          | Expr::Negative(expr)

                          | Expr::Cast(Cast { expr, .. })

                          | Expr::TryCast(TryCast { expr, .. })

                          | Expr::InSubquery(InSubquery{ expr, .. }) => vec![expr.as_ref()],

                          | Expr::InSubquery(InSubquery { expr, .. }) => expr.apply_elements(f),

Contributor

alamb Nov 18, 2024

it is nice to avoid creating vecs here as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common logical-expr optimizer sql