Skip to content
This repository has been archived by the owner on Nov 23, 2017. It is now read-only.

support for spark 2.2.0? #110

Open
kmu-leeky opened this issue Jul 16, 2017 · 8 comments
Open

support for spark 2.2.0? #110

kmu-leeky opened this issue Jul 16, 2017 · 8 comments

Comments

@kmu-leeky
Copy link

It looks like spark 2.2.0 is officially released. Is it going to be supported in spark-ec2 shortly?

@shivaram
Copy link
Contributor

We can support it. Would you like to open a PR ?

@kmu-leeky
Copy link
Author

I tried locally, but it does not seem as simple as I first thought - just adding 2.2.0 to "VALID_SPARK_VERSIONS" does not really work. Few things to consider. The base image contains Hadoop 2.4, while the Spark binary files are provided from Hadoop 2.6 (spark-2.2.0-bin-hadoop2.6.tgz). The base image also contains Java 1.7, and I read few documents saying that either the recent Hadoop or Spark needs Java 1.8.

@shivaram
Copy link
Contributor

I see. Those do require more changes including changes to the AMI and Hadoop scripts. Unfortunately I dont have time right now to try out the changes right now.

@kmu-leeky
Copy link
Author

that's ok. I tweaked the code locally to run 2.2.0 in my repo. I will create a PR if the modification and images can be generalized.

@knesterovich
Copy link

Hey guys, could you please clarify if there are any updates\progress on this issue? @kmu-leeky were you able to tweak your local code to make it PRable?

@nchammas
Copy link
Contributor

For those still waiting for spark-ec2 to support Spark 2.2, I recommend taking a look at my project,
Flintrock. It's basically a faster spark-ec2 with a better user experience.

If anyone does submit a PR adding Spark 2.2 support to spark-ec2, ping me and I'll take a look. Unfortunately, updating the spark-ec2 AMIs to fully support new Spark versions (e.g. adding Java 8) is non-trivial. On Flintrock, you don't need to wait for new commits, AMIs, or branches to be created. You just set an option to pick your version of Spark. Most of the time with Flintrock you can use a new Spark version the day it comes out without any issue.

@shivaram
Copy link
Contributor

+1 to what @nchammas said. We unfortunately do not have bandwidth to create new AMIs / update spark-ec2 to match the Spark releases.

@yektayazdani
Copy link

I tried changing the source to use the hadoop 2.7 which the default yarn is used.
so once I change it it starts referring to

http://s3.amazonaws.com/spark-related-packages/spark-2.2.0-bin-hadoop2.4.tgz

I tried changing the init.sh in spark folder but for some reason thats not going through. Let me know where I should make the changes and I will add to the source since we need to use this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants