[BUG] Cannot read files into dataframe in Databricks 11.3 LTS Runtime 3.3.0 Spark #682

james-miles-ccy · 2022-11-16T13:11:50Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

When running v2 excel pySpark code below in Databricks 11.3 LTS Runtime:

df = spark.read.format("excel")
.option("header", True)
.option("inferSchema", True)
.load(fr"{folderpath}//.xlsx")
display(df)

I receive the following error upon attempting to display or use the resulting dataframe:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 101) (10.94.235.131 executor 1): java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.options()Lorg/apache/spark/sql/catalyst/FileSourceOptions;

Expected Behavior

The resulting Dataframe should display correctly.

Steps To Reproduce

set the folderpath variable to a location containing excel files, and run the below python code in latest runtime of Databricks:

df = spark.read.format("excel")
.option("header", True)
.option("inferSchema", True)
.load(fr"{folderpath}//.xlsx")
display(df)

Environment

- Spark version:3.3.0
- Spark-Excel version:0.18.5
- OS:Windows 10
- Cluster environment

Anything else?

No response

nightscape · 2022-11-16T13:18:08Z

Hey @james-miles-ccy, the Spark-Excel version should consist of the Spark version and the version of Spark-Excel itself.
You were only specifying the version of Spark-Excel. Can you check you were using 3.3.1_0.18.5?

james-miles-ccy · 2022-11-16T13:33:53Z

Yes I am using 3.3.1_0.18.5

nightscape · 2022-11-16T16:13:09Z

Can you check the same thing with a local or other non-Databricks Spark 3.3.0?
We already had the case once where Databricks used a slightly different and not fully API-compatible version of Spark in their Runtime than the officially published one.

james-miles-ccy · 2022-11-22T15:22:59Z

I have installed Pyspark/spark-excel locally and V1 format works fine and generates dataframes in 3.3.1 spark version, but using a path for multiple files (ie V2 format) is causing issues where cells are hanging/not completing. I am using the same spark-excel version as stated above.

nightscape · 2022-11-22T16:53:32Z

Is it the same error/issue as on DataBricks?

james-miles-ccy · 2022-11-24T15:00:15Z

No, in Databricks you receive the error listed in my original comment, where as local causes endless/ non completing execution.

FYI, this is only an issue for v2, v1 works in both Databricks and local.

snehawankhade · 2022-11-30T20:25:41Z

I am facing same issue with V2 (Spark version:3.3.0, Spark-excel: 3.3.1_0.18.5). v1 works but not completely. input_file_name() returns empty string.

nightscape · 2022-12-01T10:41:01Z

input_file_name is only supported in v2. Unfortunately, I didn't have time to look into the original issue.

dazfuller · 2022-12-11T15:50:27Z

Hey @nightscape. This got mentioned in our implementation as well

I think I've traced the issue down to Databricks using a patched spark runtime in the 11.x runtimes (and 12.0 beta runtime) which includes a change from the master branch of Spark which isn't in the 3.3 support branch.

I'm looking into this further at the moment and I'll shout if I find anything

dazfuller · 2022-12-23T18:45:24Z

Just to add an update. I've been talking with Databricks and there's a fix coming which we'll resolve this in the 11.x and 12.x runtimes. Should hopefully be coming in January

nightscape · 2022-12-24T01:27:45Z

@dazfuller thanks a lot for pushing this forward and keeping us updated here!!
We had a similar issue before, so I guess Databricks breaking compatibility with the Open Source Spark version is sth. we have to keep an eye on...

james-miles-ccy · 2023-04-13T14:48:29Z

Hi All, FYI looks like this has all been resolved by Databricks on 12.1 runtime!

minnieshi · 2024-12-17T14:15:40Z

this happens to

spark-excel_2.12-3.3.1_0.18.7 + Spark 3.5.0 (Azure databricks 15.4LTS)
and
spark-excel_2.12-3.3.3_0.20.3 + spark 3.4.1 (Azure databricks 13.3LTS)
spark-excel_2.12-3.3.3_0.20.3 + spark 3.5.0 (Azure databricks 15.4LTS) too

nightscape · 2024-12-18T11:51:01Z

@minnieshi do you get the exact same error as in the first post?

java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.options()Lorg/apache/spark/sql/catalyst/FileSourceOptions;

minnieshi · 2024-12-18T17:21:47Z

Yes. The same error. @nightscape So, I instead used a combination of lower versions and wrote here in the hope that higher versions could be used. I have now copied the error below: `AbstractMethodError: org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.options()Lorg/apache/spark/sql/catalyst/FileSourceOptions;`

…

On Wed, Dec 18, 2024 at 12.51 Martin Mauch ***@***.***> wrote: @minnieshi <https://github.com/minnieshi> do you get the exact same error as in the first post? java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.options()Lorg/apache/spark/sql/catalyst/FileSourceOptions; — Reply to this email directly, view it on GitHub <#682 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACWMMBVVURHZIMRN3IRFS4L2GFOTXAVCNFSM6AAAAABTYQZ6EGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJRGEZDGMBZGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mmicu · 2025-01-27T16:35:52Z

@minnieshi, I think I am having a similar issue. Do you know which version I could use to make it run for Databricks 14.3 and Spark 3.5.0?

minnieshi · 2025-01-27T21:30:02Z

I tried all matrix, i could not get it run on spark 3.5.0 Kind regards Min

…

On Mon, Jan 27, 2025 at 17.36 Marco ***@***.***> wrote: @minnieshi <https://github.com/minnieshi>, I think I am having a similar issue <#926>. Do you know which version I could use to make it run for Databricks 14.3 and Spark 3.5.0? — Reply to this email directly, view it on GitHub <#682 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACWMMBQEF7J4HYCKEBO2ZBT2MZN77AVCNFSM6AAAAABTYQZ6EGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJWGMYDQNRUGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mmicu · 2025-01-27T21:52:48Z

@minnieshi, thanks for the quick feedback. Do you know what exactly cause the issue?

Because I have checked the JAR which contains that file and the corresponding class. Everything looks fine and properly defined.

nightscape · 2025-01-28T09:19:18Z

@mmicu can you access the JAR files on Databricks?
My best guess is that Databricks (again) made some non-binary-compatible changes in their version of Spark.

mmicu · 2025-01-28T10:54:17Z

@nightscape, yes, I should have access. I could try to get some information from the cluster and its JAR if you need.

github-actions bot mentioned this issue Feb 6, 2023

[BUG] Cannot read files into dataframe in Databricks 9.1 LTS Runtime 3.1.2 Spark #712

Open

1 task

github-actions bot mentioned this issue Mar 30, 2023

[BUG] Cannot read/ write dataframe after loading file in Databricks 12.1 Runtime 3.3.1 Spark #724

Open

1 task

james-miles-ccy closed this as completed Apr 13, 2023

dinesh1512 mentioned this issue Apr 23, 2024

[BUG] Cannot read files into dataframe in Databricks 13.3 LTS Runtime 3.3.0 Spark #853

Closed

1 task

nightscape mentioned this issue Dec 4, 2024

Consider using MAJOR.MINOR Spark version in package version #902

Open

minnieshi mentioned this issue Dec 18, 2024

[BUG] ClassNotFoundException for 'excel.DefaultSource' while using API V2 #789

Closed

1 task

nightscape reopened this Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Cannot read files into dataframe in Databricks 11.3 LTS Runtime 3.3.0 Spark #682

[BUG] Cannot read files into dataframe in Databricks 11.3 LTS Runtime 3.3.0 Spark #682

james-miles-ccy commented Nov 16, 2022 •

edited

Loading

nightscape commented Nov 16, 2022

james-miles-ccy commented Nov 16, 2022

nightscape commented Nov 16, 2022

james-miles-ccy commented Nov 22, 2022 •

edited

Loading

nightscape commented Nov 22, 2022

james-miles-ccy commented Nov 24, 2022 •

edited

Loading

snehawankhade commented Nov 30, 2022 •

edited

Loading

nightscape commented Dec 1, 2022

dazfuller commented Dec 11, 2022

dazfuller commented Dec 23, 2022

nightscape commented Dec 24, 2022

james-miles-ccy commented Apr 13, 2023

minnieshi commented Dec 17, 2024

nightscape commented Dec 18, 2024

minnieshi commented Dec 18, 2024 via email •

edited

Loading

mmicu commented Jan 27, 2025

minnieshi commented Jan 27, 2025 via email

mmicu commented Jan 27, 2025

nightscape commented Jan 28, 2025

mmicu commented Jan 28, 2025

[BUG] Cannot read files into dataframe in Databricks 11.3 LTS Runtime 3.3.0 Spark #682

[BUG] Cannot read files into dataframe in Databricks 11.3 LTS Runtime 3.3.0 Spark #682

Comments

james-miles-ccy commented Nov 16, 2022 • edited Loading

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

nightscape commented Nov 16, 2022

james-miles-ccy commented Nov 16, 2022

nightscape commented Nov 16, 2022

james-miles-ccy commented Nov 22, 2022 • edited Loading

nightscape commented Nov 22, 2022

james-miles-ccy commented Nov 24, 2022 • edited Loading

snehawankhade commented Nov 30, 2022 • edited Loading

nightscape commented Dec 1, 2022

dazfuller commented Dec 11, 2022

dazfuller commented Dec 23, 2022

nightscape commented Dec 24, 2022

james-miles-ccy commented Apr 13, 2023

minnieshi commented Dec 17, 2024

nightscape commented Dec 18, 2024

minnieshi commented Dec 18, 2024 via email • edited Loading

mmicu commented Jan 27, 2025

minnieshi commented Jan 27, 2025 via email

mmicu commented Jan 27, 2025

nightscape commented Jan 28, 2025

mmicu commented Jan 28, 2025

james-miles-ccy commented Nov 16, 2022 •

edited

Loading

james-miles-ccy commented Nov 22, 2022 •

edited

Loading

james-miles-ccy commented Nov 24, 2022 •

edited

Loading

snehawankhade commented Nov 30, 2022 •

edited

Loading

minnieshi commented Dec 18, 2024 via email •

edited

Loading