[bitnami/spark] connect spark session on Jupyter #77486

gesangwibawono1 · 2025-02-14T08:47:28Z

Name and Version

bitnami/spark:3.5

What architecture are you using?

None

What steps will reproduce the bug?

It can successfully connect to the Spark session on the Jupyter container.

from pyspark.sql import SparkSession

# Create Spark session with proper configuration
spark = SparkSession.builder \
    .appName("JupyterTest") \
    .master("spark://spark-master:7077") \
    .config("spark.driver.host", "jupyter") \
    .config("spark.hadoop.fs.defaultFS", "hdfs://namenode:9000") \
    .getOrCreate()

But encounters an error when executing Spark DataFrame creation.

# Create test DataFrame
data = [("John", 30), ("Alice", 25), ("Bob", 35)]
df = spark.createDataFrame(data, ["name", "age"])

What is the expected behavior?

It can perform Spark DataFrame and SQL operations, as well as read from and write to HDFS.

What do you see instead?

If use spark-shell inside the Spark master container, it can perform some Spark DataFrame operations and read/write to the Hadoop HDFS container cluster. However, it does not work when using a remote client via the Jupyter container.

Additional information

No response

The text was updated successfully, but these errors were encountered:

javsalgar · 2025-02-17T08:59:06Z

Hi!

Are you running the jupyter notebook inside the Kubernetes cluster? Just to ensure that the spark master is accessible

gesangwibawono1 · 2025-02-17T17:11:55Z

Dear Mr. Javier,

Thank you for your response.

I have attempted to set up the environment using both Kubernetes and Docker Compose. In Kubernetes, both the Jupyter pod and the Spark master were within the same namespace, ensuring they could communicate. Similarly, in Docker Compose, they shared the same network. However, the setup did not function as expected.

On Jupyter, I was able to establish a connection with the Spark session, but I encountered issues when performing DataFrame operations. On the other hand, when using spark session from Spark Shell, Spark DataFrame operations, including reading and writing to HDFS, worked as expected.

For my Hadoop setup, I deployed the cluster in Kubernetes using the Helm chart pfisterer/apache-hadoop-helm. In Docker Compose, I used the BDE2020 Hadoop image. Additionally, I used the Jupyter image jupyter/all-spark-notebook:x86_64-spark-3.5.0. Interestingly, when running a Spark session with spark.master=local in Jupyter (instead of connecting to the Bitnami Spark master cluster), I was able to successfully perform read/write operations in Hadoop.

I sincerely appreciate your effort and assistance.

javsalgar · 2025-02-18T09:35:08Z

Hi,

So, if I understood correctly, it seems that the Bitnami Spark container supports DataFrame operations when you execute it inside the container, but for some reason it is failing when using Jupyter Notebook. It seems to me that this goes beyond the Bitnami packaging of Bitnami Spark and maybe something is incorrect when using the Jupyter integration of Spark only for Dataframe Operations. I will leave the ticket open in case someone from the community wants to share their experience, but my suggestion would be to check with the upstream Jupyter and Spark communities to see if there is something incorrect in the Jupyter configuration.

gesangwibawono1 · 2025-02-18T15:53:27Z

Oke thank you for explanation and suggestion

gesangwibawono1 added the tech-issues The user has a technical issue about an application label Feb 14, 2025

github-actions bot added the triage Triage is needed label Feb 14, 2025

github-actions bot assigned javsalgar Feb 14, 2025

javsalgar changed the title ~~connect spark session on Jupyter~~ [bitnami/spark] connect spark session on Jupyter Feb 17, 2025

javsalgar added the spark label Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/spark] connect spark session on Jupyter #77486

[bitnami/spark] connect spark session on Jupyter #77486

gesangwibawono1 commented Feb 14, 2025 •

edited by javsalgar

Loading

javsalgar commented Feb 17, 2025

gesangwibawono1 commented Feb 17, 2025

javsalgar commented Feb 18, 2025

gesangwibawono1 commented Feb 18, 2025

[bitnami/spark] connect spark session on Jupyter #77486

[bitnami/spark] connect spark session on Jupyter #77486

Comments

gesangwibawono1 commented Feb 14, 2025 • edited by javsalgar Loading

Name and Version

What architecture are you using?

What steps will reproduce the bug?

What is the expected behavior?

What do you see instead?

Additional information

javsalgar commented Feb 17, 2025

gesangwibawono1 commented Feb 17, 2025

javsalgar commented Feb 18, 2025

gesangwibawono1 commented Feb 18, 2025

gesangwibawono1 commented Feb 14, 2025 •

edited by javsalgar

Loading