-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coerce Array inner types #13452
base: main
Are you sure you want to change the base?
Coerce Array inner types #13452
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good!
you can add tests in datafusion/sqllogictest/test_files/union.slt
That was a very fast review 😅🙏 |
@@ -1138,27 +1138,44 @@ fn numeric_string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<D | |||
} | |||
} | |||
|
|||
fn coerce_list_children(lhs_field: &FieldRef, rhs_field: &FieldRef) -> Option<FieldRef> { | |||
Some(Arc::new( | |||
Arc::unwrap_or_clone(Arc::clone(lhs_field)).with_data_type(comparison_coercion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comparison_coercion
should we use type_union_resolution
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So both will work, but IMO the current version is a bit better as it makes code aligned with the dictionally behaviour (dictionary_comparison_coercion
uses comparison_coercion
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use type_union_resolution
here since we need to preserve dictionary when we union two array. But comparison coercion doesn't preserve dictionary type, only inner type matters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have misunderstood your point, but I am pretty sure comparison_coercion
preserves dicts:
.or_else(|| dictionary_comparison_coercion(lhs_type, rhs_type, true)) |
For example, this query correctly casts Dict(Utf8)
to Dict(LargeUtf8)
select arrow_typeof(x) from (select make_array(arrow_cast('a', 'Dictionary(Int8, Utf8)')) x UNION ALL SELECT make_array(arrow_cast('b', 'Dictionary(Int8, LargeUtf8)'))) x;
Also, I think type_union_resolution
is a bit more limiting than comparison_coercion
, and so, for example, if I switch to it, the following two queries will stop working:
-- type_union_resolution can't cast nulls
select make_array(arrow_cast('a', 'Utf8')) x UNION ALL SELECT make_array(NULL) x;
-- type_union_resolution can't handle large lists (or fixed lists)
select make_array(make_array(1)) x UNION ALL SELECT make_array(arrow_cast(make_array(-1), 'LargeList(Int8)')) x;
@@ -1138,27 +1138,44 @@ fn numeric_string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<D | |||
} | |||
} | |||
|
|||
fn coerce_list_children(lhs_field: &FieldRef, rhs_field: &FieldRef) -> Option<FieldRef> { | |||
Some(Arc::new( | |||
Arc::unwrap_or_clone(Arc::clone(lhs_field)).with_data_type(comparison_coercion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#13468 seems related to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this PR doesn’t change names behaviour, let’s go as is and then fix it separately in that PR you highlighted?
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Which issue does this PR close?
Closes #12291
Rationale for this change
Currently, we don't attempt to coerce inner list types, and so this works (although it shouldn't)
What changes are included in this PR?
Now we’ll try to coerce inner types instead of just using the type of the first array.
Are these changes tested?
Yes, added a test
Are there any user-facing changes?
No