Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] datasourcev2_read_test failure on Databricks 14.3 #11988

Closed
razajafri opened this issue Jan 21, 2025 · 1 comment · Fixed by #12082
Closed

[BUG] datasourcev2_read_test failure on Databricks 14.3 #11988

razajafri opened this issue Jan 21, 2025 · 1 comment · Fixed by #12082
Assignees
Labels
bug Something isn't working

Comments

@razajafri
Copy link
Collaborator

razajafri commented Jan 21, 2025

On https://github.com/razajafri/spark-rapids/tree/SP-10661-db-14.3-deletion-vectors branch which is pointing to commit at the time of writing this fails to execute datasourcev2_read_test when you run it by executing the following lines after building spark-rapids on Databricks 14.3

WITH_DEFAULT_UPSTREAM_SHIM=0 SPARK_SUBMIT_FLAGS="--conf spark.rapids.sql.detectDeltaLogQueries=false --conf spark.rapids.sql.format.parquet.reader.type=PERFILE" TEST_PARALLEL=0 TESTS=datasourcev2_read_test.py ./jenkins/databricks/test.sh

The following exception is thrown

E                   py4j.protocol.Py4JJavaError: An error occurred while calling o195.load.
E                   : org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find data source: com.nvidia.spark.rapids.tests.datasourcev2.parquet.ArrowColumnarDataSourceV2. Please find packages at `https://spark.apache.org/third-party-projects.html`. SQLSTATE: 42K02
E                       at org.apache.spark.sql.errors.QueryExecutionErrors$.dataSourceNotFoundError(QueryExecutionErrors.scala:913)
E                       at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:789)
E                       at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:857)
E                       at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:342)
E                       at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:231)
E                       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
E                       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
E                       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
E                       at java.lang.reflect.Method.invoke(Method.java:498)
E                       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E                       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
E                       at py4j.Gateway.invoke(Gateway.java:306)
E                       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E                       at py4j.commands.CallCommand.execute(CallCommand.java:79)
E                       at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
E                       at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
E                       at java.lang.Thread.run(Thread.java:750)
E                   Caused by: java.lang.ClassNotFoundException: com.nvidia.spark.rapids.tests.datasourcev2.parquet.ArrowColumnarDataSourceV2.DefaultSource
E                       at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
E                       at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
E                       at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
E                       at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$6(DataSource.scala:773)
E                       at scala.util.Try$.apply(Try.scala:213)
E                       at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:773)
E                       at scala.util.Failure.orElse(Try.scala:224)
E                       at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:773)
E                       ... 15 more
@razajafri razajafri added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jan 21, 2025
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Jan 21, 2025
@sameerz sameerz linked a pull request Feb 7, 2025 that will close this issue
gerashegalov added a commit that referenced this issue Feb 10, 2025
Resolves #11988
Resolves #11990
Resolves #12020 

 and related issues

- Consolidate DBR-specific logic in jenkins/databricks/common_vars.sh
- Add DBR versions suffix when necessary

---------

Signed-off-by: Gera Shegalov <[email protected]>
@gerashegalov
Copy link
Collaborator

Fixed by #12082

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants