-
Notifications
You must be signed in to change notification settings - Fork 55
Does this support multiple spark notebooks ? #8
Comments
No not in the current way it works. The original intention was to integrate this with a notebook service at CERN where only one instance of Spark would run inside a docker container. So at the moment only one notebook is supported. I think it can be made to support multiple notebooks with some changes. Would you be interested in collaborating on this? |
Sure - I can collaborate. The
It may be easier to create a new "kernel" called SparkMonitorKernel which created the pyspark session, sets the correct configurations, and gets the SparkUI port |
One suggestion would be to generate a random free port number when the kernel extension creates the configuration object, and set the property We could default to the environment variable The kernel could in some way alert the server-extension of this, possibly through the front-end, which can add a GET parameter or send a message beforehand. In my experience accessing the SparkContext object from the kernel extension causes some nasty errors. Such as when using getOrCreate from an extension might end up creating a session without the user starting one. We do not know when the user creates a session or what its named in the python namespace. As such we are unable to touch the SparkContext, giving the user more freedom to start/stop and configure it as required. About creating a custom kernel:
|
Hello, I've begun the process to allow the notebook to be able to directly query the sparkcontext for the web URL: This would let users use this extension w/o any special configuration or modification of their scripts. We can poll the singleton periodically to see when the context starts, then query the context for the correct web URL connection string. |
Hello, I've modified the SparkMonitor to work with Multiple Spark Sessions here: https://github.com/ben-epstein/sparkmonitor @krishnan-r If you'd like to merge it into your repo just let me know. For anyone interested in using it, you can install it with
If you've already installed the original sparkmonitor, you're going to have to remove it as well as the jupyter extension (which I'm not actually sure how to do...). If you're running it in a docker image just rebuild with the new pip module. If you're running locally I'm unsure. In order for these changes to take effect, you need to fully remove the old extension and then enable this one. If you just want to test it, you can clone the repo and run
|
The architecture at https://krishnan-r.github.io/sparkmonitor/how.html#the-notebook-webserver-extension---a-spark-web-ui-proxy seems to suggest that if I run multiple notebooks running spark its not going to work as only :4040 will be proxied
The text was updated successfully, but these errors were encountered: