Cannot send all messages successfully before Spark session terminates #167

mdobrin · 2021-04-10T04:52:00Z

I'm running into an issue where my Spark session is getting killed before all of my Kafka messages can be produced.
In the documentation I saw there was a callback mechanism that can be used. I see that this callback method is called for each record that is produced. The best solution I can come up with is just comparing the number of times the callback is called with the count in my data frame, but is there any more elegant solution that comes to mind?

val spark = SparkSession
      .builder()
      .config("spark.debug.maxToStringFields", 100000)
      .enableHiveSupport()
      .getOrCreate()

    for ((key, value) <- myConfigMap) {
      producerConfig.setProperty(key, value)
    }

    for (sqlFileName <- sqlFileNamesArg.split(",")) {
      sqlParamsMap("TMPID") = sqlParamsMap("TMPID").toString
        .replaceAll("\\$\\{sqlFileName\\}", sqlFileName.replaceAll(".sql", ""))

      val theSql = render(spark, sqlFileName, sqlParamsMap)
      val df = spark.sql(theSql)

      if (!df.rdd.isEmpty) {
        df.rdd.writeToKafka(
          producerConfig,
          s => new ProducerRecord[String, String](topic, convertRowToJSON(s))
        )
      }
    }

    //  Edit: Commenting out the below call to stop() seems to make no difference on whether the 
    //  Spark session early-terminate issue occurs or not.
    spark.stop()  // call explicitly to avoid issue here -> https://issues.apache.org/jira/browse/SPARK-24981
)

The text was updated successfully, but these errors were encountered:

BenFradet · 2021-04-12T13:21:33Z

this is weird because we stop the session for each test without issues https://github.com/BenFradet/spark-kafka-writer/blob/master/src/test/scala/com/github/benfradet/spark/kafka/writer/SKRSpec.scala#L68-L77

mdobrin · 2021-04-13T21:38:56Z

Sorry, I forgot to mention this - I am using version 0.3.0 due to my Spark version being 2.1.1.
The code you link to above looks slightly different for 0.3.0 - could this be related?

https://github.com/BenFradet/spark-kafka-writer/blob/0.3.0/spark-kafka-0-10-writer/src/test/scala/com/github/benfradet/spark/kafka010/writer/SKRSpec.scala#L63-L68

BenFradet · 2021-04-14T08:25:27Z

ah, it might well be 🤔

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot send all messages successfully before Spark session terminates #167

Cannot send all messages successfully before Spark session terminates #167

mdobrin commented Apr 10, 2021 •

edited

Loading

BenFradet commented Apr 12, 2021

mdobrin commented Apr 13, 2021

BenFradet commented Apr 14, 2021

Cannot send all messages successfully before Spark session terminates #167

Cannot send all messages successfully before Spark session terminates #167

Comments

mdobrin commented Apr 10, 2021 • edited Loading

BenFradet commented Apr 12, 2021

mdobrin commented Apr 13, 2021

BenFradet commented Apr 14, 2021

mdobrin commented Apr 10, 2021 •

edited

Loading