Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count wildcard alias #14927

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Count wildcard alias #14927

wants to merge 8 commits into from

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Feb 28, 2025
@jayzhan211
Copy link
Contributor Author

    /// Return `self AS name` alias expression
    pub fn alias(self, name: impl Into<String>) -> Expr {
        Expr::Alias(Alias::new(self, None::<&str>, name.into()))
    }

Add duplicate name check here didn't help.

let mut missing_cols: IndexSet<Column> = IndexSet::new();
sorts.iter().try_for_each::<_, Result<()>>(|sort| {
let columns = sort.expr.column_refs();
missing_cols.extend(
columns
.into_iter()
.filter(|c| !schema.has_column(c))
.cloned(),
);
Ok(())
})?;
if missing_cols.is_empty() {
return Ok(Self::new(LogicalPlan::Sort(Sort {
expr: normalize_sorts(sorts, &self.plan)?,
input: self.plan,
fetch,
})));
}

We have count(*) (alias) and count(1) (column name) which mismatches in missing_cols in sort.

\n| plan_type | plan |\
\n+---------------+------------------------------------------------------------------------------------------------------------+\
\n| logical_plan | Projection: t1.b, count(*) |\
\n| | Sort: count(Int64(1)) AS count(*) AS count(*) ASC NULLS LAST |\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\n| logical_plan | LeftSemi Join: |\
\n| | TableScan: t1 projection=[a, b] |\
\n| | SubqueryAlias: __correlated_sq_1 |\
\n| | Projection: |\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why empty project?

03)--SubqueryAlias: __correlated_sq_1
04)----Projection:
05)------Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]]
06)--------TableScan: t2 projection=[]
Copy link
Contributor Author

@jayzhan211 jayzhan211 Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimize_projections remove exprs in projection but keep the empty one

[2025-02-28T06:22:26Z DEBUG datafusion_optimizer::utils] simplify_expressions:
    Projection: t1.a
      LeftSemi Join: 
        Filter: Boolean(true)
          TableScan: t1
        SubqueryAlias: __correlated_sq_1
          Projection: count(Int64(1)) AS count(*)
            Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]]
              TableScan: t2
    
[2025-02-28T06:22:26Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'unwrap_cast_in_comparison' (pass 0)
[2025-02-28T06:22:26Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'common_sub_expression_eliminate' (pass 0)
[2025-02-28T06:22:26Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'eliminate_group_by_constant' (pass 0)
[2025-02-28T06:22:26Z DEBUG datafusion_optimizer::utils] optimize_projections:
    LeftSemi Join: 
      Filter: Boolean(true)
        TableScan: t1 projection=[a]
      SubqueryAlias: __correlated_sq_1
        Projection: 
          Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]]
            TableScan: t2 projection=[]

@jayzhan211 jayzhan211 marked this pull request as draft March 1, 2025 00:53
@jayzhan211 jayzhan211 marked this pull request as ready for review March 2, 2025 00:44
@jayzhan211 jayzhan211 changed the title Count alias Count wildcard alias Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change in 46: count_all() expr_fn function now displayed as count(1) rather than count(*)
1 participant