Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coerce Array inner types #13452

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

blaginin
Copy link
Contributor

Which issue does this PR close?

Closes #12291

Rationale for this change

Currently, we don't attempt to coerce inner list types, and so this works (although it shouldn't)

SELECT make_array(2) x UNION ALL SELECT make_array(now()) x;

What changes are included in this PR?

Now we’ll try to coerce inner types instead of just using the type of the first array.

Are these changes tested?

Yes, added a test

Are there any user-facing changes?

No

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good!

you can add tests in datafusion/sqllogictest/test_files/union.slt

datafusion/expr-common/src/type_coercion/binary.rs Outdated Show resolved Hide resolved
datafusion/expr-common/src/type_coercion/binary.rs Outdated Show resolved Hide resolved
datafusion/expr-common/src/type_coercion/binary.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 16, 2024
@blaginin
Copy link
Contributor Author

That was a very fast review 😅🙏

@blaginin blaginin marked this pull request as ready for review November 16, 2024 18:22
datafusion/expr-common/src/type_coercion/binary.rs Outdated Show resolved Hide resolved
@@ -1138,27 +1138,44 @@ fn numeric_string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<D
}
}

fn coerce_list_children(lhs_field: &FieldRef, rhs_field: &FieldRef) -> Option<FieldRef> {
Some(Arc::new(
Arc::unwrap_or_clone(Arc::clone(lhs_field)).with_data_type(comparison_coercion(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparison_coercion

should we use type_union_resolution here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So both will work, but IMO the current version is a bit better as it makes code aligned with the dictionally behaviour (dictionary_comparison_coercion uses comparison_coercion)

Copy link
Contributor

@jayzhan211 jayzhan211 Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use type_union_resolution here since we need to preserve dictionary when we union two array. But comparison coercion doesn't preserve dictionary type, only inner type matters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have misunderstood your point, but I am pretty sure comparison_coercion preserves dicts:

.or_else(|| dictionary_comparison_coercion(lhs_type, rhs_type, true))

For example, this query correctly casts Dict(Utf8) to Dict(LargeUtf8)

select arrow_typeof(x) from (select make_array(arrow_cast('a', 'Dictionary(Int8, Utf8)')) x UNION ALL SELECT make_array(arrow_cast('b', 'Dictionary(Int8, LargeUtf8)'))) x;

Also, I think type_union_resolution is a bit more limiting than comparison_coercion, and so, for example, if I switch to it, the following two queries will stop working:

-- type_union_resolution can't cast nulls
select make_array(arrow_cast('a', 'Utf8')) x UNION ALL SELECT make_array(NULL) x;

-- type_union_resolution can't handle large lists (or fixed lists)
select make_array(make_array(1)) x UNION ALL SELECT make_array(arrow_cast(make_array(-1), 'LargeList(Int8)')) x;

datafusion/expr-common/src/type_coercion/binary.rs Outdated Show resolved Hide resolved
@@ -1138,27 +1138,44 @@ fn numeric_string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<D
}
}

fn coerce_list_children(lhs_field: &FieldRef, rhs_field: &FieldRef) -> Option<FieldRef> {
Some(Arc::new(
Arc::unwrap_or_clone(Arc::clone(lhs_field)).with_data_type(comparison_coercion(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb can there be a problem if we coerce two lists and they have different field names?

@blaginin the resulting field nullability should be OR of nullability of sources

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#13468 seems related to this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this PR doesn’t change names behaviour, let’s go as is and then fix it separately in that PR you highlighted?

@jayzhan211

This comment was marked as outdated.

jayzhan211
jayzhan211 previously approved these changes Nov 18, 2024
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jayzhan211 jayzhan211 dismissed their stale review November 18, 2024 03:57

type_union_resolution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arrays of non-coercible types are coercible
3 participants