Is there a plan to implement Spark Connect to Joblib-spark? #50

Kurdzik · 2023-07-04T11:37:08Z

Hi,
Recently i've finished setting up a Spark cluster on couple of separate VMs.

When i was trying to perform SKlearn model training using Joblib-spark i've encountered following problem:

from pyspark.sql import SparkSession

spark = SparkSession.builder.remote('sc://<master node ip>').appName("JoblibSparkBackend").getOrCreate()
register_spark()

param_distributions = {
    "n_estimators": list(range(100, 500, 50)),
    "max_depth": list(range(2, 7)),
}

model = RandomForestRegressor()
random_forest = RandomizedSearchCV(model,param_distributions,cv=5,refit=True)

with parallel_backend('spark', n_jobs=4):
    random_forest.fit(X=X_train,y=y_train)

...
NotImplementedError: sparkContext() is not implemented.

Is there a workaround for this issue? Or is this something that will be implemented in a future ?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a plan to implement Spark Connect to Joblib-spark? #50

Is there a plan to implement Spark Connect to Joblib-spark? #50

Kurdzik commented Jul 4, 2023

Is there a plan to implement Spark Connect to Joblib-spark? #50

Is there a plan to implement Spark Connect to Joblib-spark? #50

Comments

Kurdzik commented Jul 4, 2023