Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Duplicated filters within (filter(TableScan)) plan for unparser #13422

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions datafusion/sql/src/unparser/utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,9 @@ pub(crate) fn try_transform_to_simple_table_scan_with_filters(
plan_stack.push(alias.input.as_ref());
}
LogicalPlan::Filter(filter) => {
filters.push(filter.predicate.clone());
if !filters.contains(&filter.predicate) {
Copy link
Contributor

@jayzhan211 jayzhan211 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change them to HashSet? including table_scan_filters.

contains is O(n) operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my previous implementation is actually Hashset, however it doesn't preserve the filter order, so I didn't end up using Hashset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I could do is maintain a temporary Hashset to keep track of the exsiting filters, while the results is still constructed with the Vector. Every filter will be checked if it exists with Hashset before pushing to the Vector. Would this approach be better than just using Vector.contains?

Copy link
Contributor

@jayzhan211 jayzhan211 Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about IndexSet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexSet would be better so that the order of the filters is preserved

filters.push(filter.predicate.clone());
}
plan_stack.push(filter.input.as_ref());
}
LogicalPlan::TableScan(table_scan) => {
Expand All @@ -344,7 +346,11 @@ pub(crate) fn try_transform_to_simple_table_scan_with_filters(
})
.collect::<Result<Vec<_>, DataFusionError>>()?;

filters.extend(table_scan_filters);
for table_scan_filter in table_scan_filters {
if !filters.contains(&table_scan_filter) {
filters.push(table_scan_filter);
}
}

let mut builder = LogicalPlanBuilder::scan(
table_scan.table_name.clone(),
Expand Down
27 changes: 27 additions & 0 deletions datafusion/sql/tests/cases/plan_to_sql.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1146,6 +1146,33 @@ fn test_join_with_table_scan_filters() -> Result<()> {

assert_eq!(sql.to_string(), expected_sql);

let right_plan_with_filter_schema = table_scan_with_filters(
Some("right_table"),
&schema_right,
None,
vec![col("right_table.age").gt(lit(10))],
)?
.build()?;
let right_plan_with_duplicated_filter =
LogicalPlanBuilder::from(right_plan_with_filter_schema.clone())
.filter(col("right_table.age").gt(lit(10)))?
.build()?;

let join_plan_duplicated_filter = LogicalPlanBuilder::from(left_plan)
.join(
right_plan_with_duplicated_filter,
datafusion_expr::JoinType::Inner,
(vec!["left.id"], vec!["right_table.id"]),
Some(col("left.id").gt(lit(5))),
)?
.build()?;

let sql = plan_to_sql(&join_plan_duplicated_filter)?;

let expected_sql = r#"SELECT * FROM left_table AS "left" JOIN right_table ON "left".id = right_table.id AND (("left".id > 5) AND ("left"."name" LIKE 'some_name' AND (right_table.age > 10)))"#;

assert_eq!(sql.to_string(), expected_sql);

Ok(())
}

Expand Down