`.as[Foo]` doesn't catch DataFrame Dataset type differences #282

dnmfarrell · 2022-09-09T20:29:27Z

Spark 3.1
sparksql-scalapb_2.12:0.11.0

When used as a dataset, scalapb generated case classes don't catch type mismatches in the logical plan like ordinary scala case classes do.

E.g. in spark shell (regular case class):

scala> case class Foo(i:Int)
defined class Foo
scala> val badData = Seq[(Long)]((5L))
badData: Seq[Long] = List(5)
scala> val bdf = badData.toDF("i")
bdf: org.apache.spark.sql.DataFrame = [i: bigint]
scala> bdf.as[Foo]
org.apache.spark.sql.AnalysisException: Cannot up cast `i` from bigint to int.
...

Scalapb case class-based encoders don't blow up on as, but when the query runs (e.g. bdf.as[Foo].head).

The text was updated successfully, but these errors were encountered:

dnmfarrell · 2022-09-09T20:30:40Z

~~(I would post my scalapb example but I'm running into the spark v scalapb protobuf version difference which I usually shade away but isn't working rn.)~~
Ah this is caused by spark-shell automatically importing spark implicits.

dnmfarrell · 2022-09-12T02:27:34Z

I forked the test repo and can reproduce the issue:

0.11.0 displays the behavior described in this ticket - val ds: Dataset[Small] = df.as[Small] does not throw an exception despite the dataframe being incompatible with the dataset.
1.0.1 does throw an exception but it's: java.lang.NoSuchMethodError - stack trace

dnmfarrell · 2022-10-14T14:16:49Z

Re: java.lang.NoSuchMethodError - that was my mistake of compiling for spark 3.2, and running on spark 3.1 🤦

It seems like casts from one shape of data to another are caught. E.G. (x: Int) -> (x: Int, y: Int) emits org.apache.spark.sql.AnalysisException: cannot resolve 'y' given input columns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`.as[Foo]` doesn't catch DataFrame Dataset type differences #282

`.as[Foo]` doesn't catch DataFrame Dataset type differences #282

dnmfarrell commented Sep 9, 2022 •

edited

Loading

dnmfarrell commented Sep 9, 2022 •

edited

Loading

dnmfarrell commented Sep 12, 2022 •

edited

Loading

dnmfarrell commented Oct 14, 2022

.as[Foo] doesn't catch DataFrame Dataset type differences #282

.as[Foo] doesn't catch DataFrame Dataset type differences #282

Comments

dnmfarrell commented Sep 9, 2022 • edited Loading

dnmfarrell commented Sep 9, 2022 • edited Loading

dnmfarrell commented Sep 12, 2022 • edited Loading

dnmfarrell commented Oct 14, 2022

`.as[Foo]` doesn't catch DataFrame Dataset type differences #282

`.as[Foo]` doesn't catch DataFrame Dataset type differences #282

dnmfarrell commented Sep 9, 2022 •

edited

Loading

dnmfarrell commented Sep 9, 2022 •

edited

Loading

dnmfarrell commented Sep 12, 2022 •

edited

Loading