-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to Python 3.12 #2072
Upgrade to Python 3.12 #2072
Conversation
Haven't tested this yet, but it would be nice if we could start looking at moving the base image to being 3.12. |
It seems we depend on this PR: conda-forge/pycurl-feedstock#27 |
Waiting for the upstream packages to be fixed. I marked this as a draft and subscribed to the relevant PR, will re-run, when the issue resolves. |
Issue for pycurl resolved upstream! |
@consideRatio thanks! @max-muoto could you please resolve conflicts? Unfortunately, GitHub is not able to do it in the web browser. |
Something is wrong, as the change should have stayed small, just a few lines of code. |
Currently rebasing, should be fixed soon. |
2455425
to
9035f65
Compare
Should be good to take a look at now. |
Seems we also need to upgrade Pandas, just went ahead and did that. |
images/pyspark-notebook/Dockerfile
Outdated
@@ -63,7 +63,7 @@ USER ${NB_UID} | |||
RUN mamba install --yes \ | |||
'grpcio-status' \ | |||
'grpcio' \ | |||
'pandas=2.0.3' \ | |||
'pandas=2.2.1' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked the comment above to make sure you’re using the proper version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have pandas<=2.2.1
so we should be good here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to check the latest stable tag, not the current main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry there. Looks like we'll need for their next release, as this commit isn't included in the latest stable tag: ericm-db/spark@98ca3ea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need to wait on this as well, since we need at least Pandas 2.1.1 to ensure compatibility with 3.12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bjornjorgensen could you please tell us when the Spark release will include this commit? (at least approximately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.. well we are waiting for hadoop 3.4.0 and a new hive release. We haven't started any RC release yet. I build and test my own jupyterlab https://github.com/bjornjorgensen/jupyter-spark-master-docker and I did try python 3.12 but it breake so match so I'm using python 3.11 as debian testing are using.
And Spark 3.5.1 don't support python 3.12 have a look at apache/spark#43922
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just revert this particular change rather than waiting for the next release of Spark? This seems like an incredibly self-imposed blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have pinned version 2.2.2 apache/spark@7c639a1
[VOTE] Release Plan for Apache Spark 4.0.0 (June 2024) Apache Spark 4.0.0 Release Plan
|
there will be a test soon Re: [DISCUSS] Spark 4.0.0 release |
images/pyspark-notebook/Dockerfile
Outdated
@@ -63,7 +63,7 @@ USER ${NB_UID} | |||
RUN mamba install --yes \ | |||
'grpcio-status' \ | |||
'grpcio' \ | |||
'pandas=2.0.3' \ | |||
'pandas=2.2.1' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just revert this particular change rather than waiting for the next release of Spark? This seems like an incredibly self-imposed blocker.
@shreve You may be interested in b-data's/my JupyterLab Python docker stack. |
Hey, any updates on this? :) |
Unfortunately, this is still blocked - we're waiting for a new spark release. |
And now also Python 3.13.0 images – without numba, though. |
[DISCUSS] Creating Spark 4.0 will be out in 2025 |
I think we have waited enough for a new Spark release and we should discuss switching to Python 3.12 earlier than "everything is 100% ready". Reasons:
@mathbunnyru @consideRatio @yuvipanda @manics let's vote on this one. I think there are several solutions, and you can choose several of them (please, select at least one). 1️⃣ Wait for Spark 4.0 then switch to Python 3.12 |
I'm voting for 3️⃣ I think with such a long release process it's ok to switch to preview versions. |
I agree with 3️⃣. I don't think it's a policy we should generally adopt, rather this is an exception due to how long it's taking to get a working combination. We can highlight the compromises made in the CHANGELOG |
Spark 4.0 preview2 does support python3.12 and now there are support for python3.13 in master branch. I hope and guess that there will be more preview's before the 4.0 final. Spark 4.0 preview2 have support for pandas 2.2.2 apache/spark#46009 |
I pushed a commit which makes Spark scripts more robust, this change is non-breaking and keeps everything the same, but the scripts will work better with newer spark and preview versions as well. |
So, the release schedule:
It's better to merge 2 PRs separately (and on different days) - this might be helpful if something goes wrong. |
All looks great so far - I merged the Spark v4 preview to main, merged changes from main to here, and updated the changelog and the list of old images. This PR now looks really simple (as it should be in most cases). I will merge this PR tomorrow. |
Describe your changes
Upgrade base image to 3.12