Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use spark.sql.legacy.allowUntypedScalaUDF #22

Open
riley-harper opened this issue Aug 2, 2022 · 0 comments
Open

Don't use spark.sql.legacy.allowUntypedScalaUDF #22

riley-harper opened this issue Aug 2, 2022 · 0 comments

Comments

@riley-harper
Copy link
Contributor

riley-harper commented Aug 2, 2022

We set this Spark configuration option to true when we were migrating from Spark 2 to Spark 3. This makes our Scala code compatible with Spark 3, but the long term solution is to modify our Scala code to satisfy the new requirements for Spark 3 and avoid setting this configuration option. The configuration option is set in hlink.spark.session.SparkConnection.

From the Spark migration guide at https://spark.apache.org/docs/3.1.1/sql-migration-guide.html:

In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is not allowed by default. Remove the return type parameter to automatically switch to typed Scala udf is recommended, or set spark.sql.legacy.allowUntypedScalaUDF to true to keep using it. In Spark version 2.4 and below, if org.apache.spark.sql.functions.udf(AnyRef, DataType) gets a Scala closure with primitive-type argument, the returned UDF returns null if the input values is null. However, in Spark 3.0, the UDF returns the default value of the Java type if the input value is null. For example, val f = udf((x: Int) => x, IntegerType), f($"x") returns null in Spark 2.4 and below if column x is null, and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 is built with Scala 2.12 by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant