fix: enable full decimal to decimal support #1385

himadripal · 2025-02-11T06:05:06Z

Completes #375

enable decimal to decimal
remove hard coded castoptions to pass to native execution
fixed castTest to match arrow invalid argument error with spark's Number out of range error.

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

use a regex to match arrow invalid argument error.

parthchandra · 2025-02-11T14:04:11Z

native/spark-expr/src/conversion_funcs/cast.rs

@@ -872,6 +872,13 @@ fn cast_array(
    let array = array_with_timezone(array, cast_options.timezone.clone(), Some(to_type))?;
    let from_type = array.data_type().clone();

+    let native_cast_options: CastOptions = CastOptions {
+        safe: !matches!(cast_options.eval_mode, EvalMode::Ansi), // take safe mode from cast_options passed
+        format_options: FormatOptions::new()


I think one can use a default value defined for FormatOptions here

In the default CAST_OPTIONS which is replaced by this native_cast_options had two these set to

static TIMESTAMP_FORMAT: Option<&str> = Some("%Y-%m-%d %H:%M:%S%.f"); timestamp_format: TIMESTAMP_FORMAT, timestamp_tz_format: TIMESTAMP_FORMAT,

If we change it to default, I checked FormatOptions::default() implementation set these

timestamp_format: None, timestamp_tz_format: None,

Hence kept it as it is defined inside default CAST_OPTIONS for comet.

Fair enough. (The format options are used only to make the cast of timestamp to string compatible with Spark, and are not needed anywhere else) but I guess it is a good idea to be consistent everywhere.

codecov-commenter · 2025-02-11T17:28:19Z

Codecov Report

Attention: Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 39.50%. Comparing base (f09f8af) to head (5cce0c4).
Report is 45 commits behind head on main.

Files with missing lines	Patch %	Lines
...src/main/scala/org/apache/comet/GenerateDocs.scala	0.00%	2 Missing ⚠️
...scala/org/apache/comet/expressions/CometCast.scala	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##               main    #1385       +/-   ##
=============================================
- Coverage     56.12%   39.50%   -16.63%     
- Complexity      976     2085     +1109     
=============================================
  Files           119      265      +146     
  Lines         11743    61597    +49854     
  Branches       2251    13092    +10841     
=============================================
+ Hits           6591    24335    +17744     
- Misses         4012    32697    +28685     
- Partials       1140     4565     +3425

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…spark version

spark/src/main/scala/org/apache/comet/expressions/CometCast.scala

spark/src/test/scala/org/apache/comet/CometCastSuite.scala

kazuyukitanimura

Mostly looks good, thank you @himadripal just minor comments

docs/source/user-guide/compatibility.md

kazuyukitanimura · 2025-02-14T19:21:53Z

spark/src/test/scala/org/apache/comet/CometCastSuite.scala

+              // for comet decimal conversion throws ArrowError(string) from arrow - across spark versions the message dont match.
+              if (sparkMessage.contains("cannot be represented as")) {
                assert(
-                  sparkException.getMessage
-                    .replace(".WITH_SUGGESTION] ", "]")
-                    .startsWith(cometMessage))
-              } else if (CometSparkSessionExtensions.isSpark34Plus) {
-                // for Spark 3.4 we expect to reproduce the error message exactly
-                assert(cometMessage == sparkMessage)
+                  cometMessage.contains("cannot be represented as") || cometMessage.contains(
+                    "too large to store"))
              } else {


There are message modifications below per spark version
Would you mind update them instead of creating another if branch?

… double to decimal and few others paths still uses spark, hence generates spark error message.

kazuyukitanimura · 2025-02-21T20:15:45Z

spark/src/test/scala/org/apache/comet/CometCastSuite.scala

+              // for comet decimal conversion throws ArrowError(string) from arrow - across spark versions the message dont match.
+              if (sparkMessage.contains("cannot be represented as")) {
+                cometMessage.contains("cannot be represented as") || cometMessage.contains(
+                  "too large to store")
              } else {


I think we still need to remove this new if block and update the test cases below.
This new block may still pass with cometMessage.contains("cannot be represented as") that seems to be an indication of Spark cast instead of native cast

when I removed the branch from the top, the test fails for double to decimal conversion with allow-incompatible flag, I think that is still using spark cast. Hence I had to put it back.

Another way to remove the if block is to convert the error message to make it similar to Spark
This is the place that we are defining Spark error messages https://github.com/apache/datafusion-comet/blob/main/native/spark-expr/src/error.rs#L36
You can check how these are used.

Or you can move the message check below and switch the expected messages based on the fromType/toType.

One or the other way. Otherwise we cannot confidently say that the test is passing due to the cannot be represented as message or the too large to store message.

Another way to remove the if block is to convert the error message to make it similar to Spark
This is the place that we are defining Spark error messages https://github.com/apache/datafusion-comet/blob/main/native/spark-expr/src/error.rs#L36
You can check how these are used

This one I checked - problem here is from native execution, we get back Arrow(ArrowError) which only has a string, precision and scale information is not present. Also to construct an error message from a string - we need to check specific string in the message.
We can try to change the ArrowError to have parameter but that will be a big change.

Or you can move the message check below and switch the expected messages based on the fromType/toType.

I'll try this.

One or the other way. Otherwise we cannot confidently say that the test is passing due to the cannot be represented as message or the too large to store message.

I'm contemplating creating a different test/check function for decimal to decimal test.

Just a thought. For the former approach, can we get precision info similar to https://github.com/apache/datafusion-comet/blob/main/native/spark-expr/src/conversion_funcs/cast.rs#L932 ?

himadripal · 2025-02-21T20:30:10Z

spark/src/main/scala/org/apache/comet/GenerateDocs.scala

@@ -69,7 +69,8 @@ object GenerateDocs {
        w.write("|-|-|-|\n".getBytes)
        for (fromType <- CometCast.supportedTypes) {
          for (toType <- CometCast.supportedTypes) {
-            if (Cast.canCast(fromType, toType) && fromType != toType) {
+            if (Cast.canCast(fromType, toType) && (fromType != toType || fromType.typeName


@andygrove please check - I added this exception for decimal

enable decimal to decimal and enable castoptions to be passed

a2689a3

use a regex to match arrow invalid argument error.

parthchandra reviewed Feb 11, 2025

View reviewed changes

make decimal precision error check hard coded common in test for all …

99727b8

…spark version

kazuyukitanimura reviewed Feb 12, 2025

View reviewed changes

spark/src/main/scala/org/apache/comet/expressions/CometCast.scala Show resolved Hide resolved

spark/src/test/scala/org/apache/comet/CometCastSuite.scala Outdated Show resolved Hide resolved

review comments and fix failing test.

42648f4

kazuyukitanimura changed the title ~~enable full decimal to decimal support~~ fix: enable full decimal to decimal support Feb 14, 2025

kazuyukitanimura reviewed Feb 14, 2025

View reviewed changes

himadripal added 2 commits February 16, 2025 07:18

review comments

4df3e68

add special handling for decimal in doc generation, fix the assert as…

5cce0c4

… double to decimal and few others paths still uses spark, hence generates spark error message.

kazuyukitanimura reviewed Feb 21, 2025

View reviewed changes

himadripal commented Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enable full decimal to decimal support #1385

fix: enable full decimal to decimal support #1385

himadripal commented Feb 11, 2025 •

edited

Loading

parthchandra Feb 11, 2025

himadripal Feb 11, 2025 •

edited

Loading

parthchandra Feb 12, 2025

codecov-commenter commented Feb 11, 2025 •

edited

Loading

kazuyukitanimura left a comment

kazuyukitanimura Feb 14, 2025

kazuyukitanimura Feb 21, 2025

himadripal Feb 21, 2025 •

edited

Loading

kazuyukitanimura Feb 25, 2025

himadripal Feb 25, 2025 •

edited

Loading

himadripal Feb 25, 2025

himadripal Feb 25, 2025

kazuyukitanimura Feb 25, 2025

himadripal Feb 21, 2025

fix: enable full decimal to decimal support #1385

Are you sure you want to change the base?

fix: enable full decimal to decimal support #1385

Conversation

himadripal commented Feb 11, 2025 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Choose a reason for hiding this comment

himadripal Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 11, 2025 • edited Loading

Codecov Report

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

himadripal Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

himadripal Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

himadripal commented Feb 11, 2025 •

edited

Loading

himadripal Feb 11, 2025 •

edited

Loading

codecov-commenter commented Feb 11, 2025 •

edited

Loading

himadripal Feb 21, 2025 •

edited

Loading

himadripal Feb 25, 2025 •

edited

Loading