mockup of collation aware hash function #2

GideonPotok · 2024-06-04T22:04:15Z

A proof of concept that we can modify OpenHashSet's hashing function to be a collation-aware hashing function.

latest review added checkinputdatatype to not support complex types containing nonbinary collations added checkinputdatatype to not support complex types containing nonbinary collations added struct test stuff Tests pass test structs fix scalastyle Collation Support for Mode

…essions/aggregate/Mode.scala Co-authored-by: Uros Bojanic <[email protected]>

h mockup added new bms

… throw internal error ### What changes were proposed in this pull request? This PR fixes the error messages and classes when Python UDFs are used in higher order functions. ### Why are the changes needed? To show the proper user-facing exceptions with error classes. ### Does this PR introduce _any_ user-facing change? Yes, previously it threw internal error such as: ```python from pyspark.sql.functions import transform, udf, col, array spark.range(1).select(transform(array("id"), lambda x: udf(lambda y: y)(x))).collect() ``` Before: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o74.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 0.0 failed 1 times, most recent failure: Lost task 15.0 in stage 0.0 (TID 15) (ip-192-168-123-103.ap-northeast-2.compute.internal executor driver): org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot evaluate expression: <lambda>(lambda x_0#3L)#2 SQLSTATE: XX000 at org.apache.spark.SparkException$.internalError(SparkException.scala:92) at org.apache.spark.SparkException$.internalError(SparkException.scala:96) ``` After: ``` pyspark.errors.exceptions.captured.AnalysisException: [INVALID_LAMBDA_FUNCTION_CALL.UNEVALUABLE] Invalid lambda function call. Python UDFs should be used in a lambda function at a higher order function. However, "<lambda>(lambda x_0#3L)" was a Python UDF. SQLSTATE: 42K0D; Project [transform(array(id#0L), lambdafunction(<lambda>(lambda x_0#3L)#2, lambda x_0#3L, false)) AS transform(array(id), lambdafunction(<lambda>(lambda x_0#3L), namedlambdavariable()))apache#4] +- Range (0, 1, step=1, splits=Some(16)) ``` ### How was this patch tested? Unittest was added ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47079 from HyukjinKwon/SPARK-48706. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Kent Yao <[email protected]>

GideonPotok and others added 3 commits May 22, 2024 18:21

Update sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expr…

f054589

…essions/aggregate/Mode.scala Co-authored-by: Uros Bojanic <[email protected]>

Merge branch 'master' into spark_47353_3_clean

5d171d6

github-actions bot added SQL CORE labels Jun 4, 2024

GideonPotok changed the base branch from spark_47353_3_clean to master June 5, 2024 14:36

tests pass

03e0f36

h mockup added new bms

GideonPotok force-pushed the cxollationmode branch from 2063e6f to 03e0f36 Compare June 5, 2024 14:40

GideonPotok changed the title ~~mockup~~ mockup of collation aware hash function Jun 5, 2024

GideonPotok mentioned this pull request Jun 5, 2024

[SPARK-47353][SQL] Enable collation support for the Mode expression apache/spark#46597

Closed

GideonPotok closed this Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mockup of collation aware hash function #2

mockup of collation aware hash function #2

GideonPotok commented Jun 4, 2024 •

edited

Loading

mockup of collation aware hash function #2

mockup of collation aware hash function #2

Conversation

GideonPotok commented Jun 4, 2024 • edited Loading

GideonPotok commented Jun 4, 2024 •

edited

Loading