-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Multi-output column expressions in frame sort
method
#17947
fix: Multi-output column expressions in frame sort
method
#17947
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17947 +/- ##
==========================================
+ Coverage 80.33% 80.40% +0.06%
==========================================
Files 1494 1494
Lines 196451 196480 +29
Branches 2817 2817
==========================================
+ Hits 157828 157977 +149
+ Misses 38103 37983 -120
Partials 520 520 ☔ View full report in Codecov by Sentry. |
sort
methodsort
method
That's wrong. The columns selected by the sorted should also be selected by the optimizer. Do you mean we don't do that atm? |
It's because we currently check that the number of column expressions matches the vec lengths of "nulls_last" / "descending" params before those expressions are expanded (there's an incorrect assumption baked-in that one expression == one column). If the expansion happens to have the same number of cols as those vec lengths then no error is raised (but can still be a bug as we may not have the intended sort behaviour if mixed true/false in "nulls_last" / "descending"). If we expand to more/less cols than those vec lengths we get an error from args_validate. Meant to park this in Draft last night while I track down the best place to handle this properly, as it's definitely not the Python check I added 😜 Update: have now fixed this properly such that we assign the indicated bool to the expanded exprs (and the fix is down in DSL/IR, not Python ;) |
fd8c57c
to
5a0def4
Compare
1fcb613
to
5562e60
Compare
5562e60
to
90e9e3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Several fixes relating to multi-output expressions (including selectors) in frame
sort
method:by
parameter; could raise an incorrect "the amount of ordering booleans … does not match the number of series …" error depending on the number of such expressions passed and how many columns they actually selected.panic
if multi-output column expression evaluates to no columns - bail with a suitable error instead.Also: standardised on "{:?}" format string for Rust-side
ColumnNotFound
error so we always get the quotes around the column name (which helps to see things like the empty string being used as a column name, which otherwise just looks like an incomplete error message). We had mixed usage here; sometimes it would show the quotes, sometimes it wouldn't (update: now addedpolars_err!(col_not_found=name)
to centralise this).Examples
Before
Panic if no matching cols:
Failed to use selectors in
by
:After
Descriptive error if no matching cols:
Successful use of selectors in
by
:Note that use of multi-output expressions in
by
may or may not fail in the current build - it depends on how many such expressions there are in relation to how many columns actually get selected/expanded.