You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We set this Spark configuration option to true when we were migrating from Spark 2 to Spark 3. This makes our Scala code compatible with Spark 3, but the long term solution is to modify our Scala code to satisfy the new requirements for Spark 3 and avoid setting this configuration option. The configuration option is set in hlink.spark.session.SparkConnection.
In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is not allowed by default. Remove the return type parameter to automatically switch to typed Scala udf is recommended, or set spark.sql.legacy.allowUntypedScalaUDF to true to keep using it. In Spark version 2.4 and below, if org.apache.spark.sql.functions.udf(AnyRef, DataType) gets a Scala closure with primitive-type argument, the returned UDF returns null if the input values is null. However, in Spark 3.0, the UDF returns the default value of the Java type if the input value is null. For example, val f = udf((x: Int) => x, IntegerType), f($"x") returns null in Spark 2.4 and below if column x is null, and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 is built with Scala 2.12 by default.
The text was updated successfully, but these errors were encountered:
We set this Spark configuration option to true when we were migrating from Spark 2 to Spark 3. This makes our Scala code compatible with Spark 3, but the long term solution is to modify our Scala code to satisfy the new requirements for Spark 3 and avoid setting this configuration option. The configuration option is set in
hlink.spark.session.SparkConnection
.From the Spark migration guide at https://spark.apache.org/docs/3.1.1/sql-migration-guide.html:
The text was updated successfully, but these errors were encountered: