Don't use `spark.sql.legacy.allowUntypedScalaUDF` #22

riley-harper · 2022-08-02T17:38:28Z

We set this Spark configuration option to true when we were migrating from Spark 2 to Spark 3. This makes our Scala code compatible with Spark 3, but the long term solution is to modify our Scala code to satisfy the new requirements for Spark 3 and avoid setting this configuration option. The configuration option is set in hlink.spark.session.SparkConnection.

From the Spark migration guide at https://spark.apache.org/docs/3.1.1/sql-migration-guide.html:

In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is not allowed by default. Remove the return type parameter to automatically switch to typed Scala udf is recommended, or set spark.sql.legacy.allowUntypedScalaUDF to true to keep using it. In Spark version 2.4 and below, if org.apache.spark.sql.functions.udf(AnyRef, DataType) gets a Scala closure with primitive-type argument, the returned UDF returns null if the input values is null. However, in Spark 3.0, the UDF returns the default value of the Java type if the input value is null. For example, val f = udf((x: Int) => x, IntegerType), f($"x") returns null in Spark 2.4 and below if column x is null, and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 is built with Scala 2.12 by default.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use `spark.sql.legacy.allowUntypedScalaUDF` #22

Don't use `spark.sql.legacy.allowUntypedScalaUDF` #22

riley-harper commented Aug 2, 2022 •

edited

Loading

Don't use spark.sql.legacy.allowUntypedScalaUDF #22

Don't use spark.sql.legacy.allowUntypedScalaUDF #22

Comments

riley-harper commented Aug 2, 2022 • edited Loading

Don't use `spark.sql.legacy.allowUntypedScalaUDF` #22

Don't use `spark.sql.legacy.allowUntypedScalaUDF` #22

riley-harper commented Aug 2, 2022 •

edited

Loading