Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] spark run on k8s Error: Master must start with yarn, spark, mesos, or local #15359

Closed
2 of 3 tasks
wangchao732 opened this issue Dec 25, 2023 · 4 comments
Closed
2 of 3 tasks
Labels
bug Something isn't working Stale
Milestone

Comments

@wangchao732
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

准备在ds中运行spark on k8s任务提示 Error: Master must start with yarn, spark, mesos, or local

What you expected to happen

export KUBECONFIG=/tmp/dolphinscheduler/exec/process/default/11930573660864/11985906373952_1/9/9/config
${SPARK_HOME}/bin/spark-submit --master k8s://https://192.168.11.107:6443/ --deploy-mode cluster --class uml.tech.spark.SparkApp --conf spark.driver.cores=1 --conf spark.driver.memory=512M --conf spark.executor.instances=2 --conf spark.executor.cores=2 --conf spark.executor.memory=2G --name TE --conf spark.kubernetes.driver.label.dolphinscheduler-label=9_9 --conf spark.kubernetes.namespace=spark-job file:/dolphinscheduler/default/resources/spark-job/uml-ne-gblogs-to-tdengine-1.0.0/lib/uml-ne-gblogs-to-tdengine-1.0.0.jar 20220311 200
[INFO] 2023-12-21 11:00:11.546 +0800 - Executing shell command : sudo -u dolphinscheduler -i /tmp/dolphinscheduler/exec/process/default/11930573660864/11985906373952_1/9/9/9_9.sh
[INFO] 2023-12-21 11:00:11.565 +0800 - process start, process id is: 14145
[INFO] 2023-12-21 11:00:12.565 +0800 - ->
WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark) overrides detected (/opt/cloudera/parcels/CDH/lib/spark).
WARNING: Running spark-class from user-defined location.
[INFO] 2023-12-21 11:00:14.567 +0800 - ->
Error: Master must start with yarn, spark, mesos, or local
Run with --help for usage help or --verbose for debug output
[ERROR] 2023-12-21 11:00:27.571 +0800 - Handle pod log error
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: The driver pod does not exist.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.run(AbstractCommandExecutor.java:211)
at org.apache.dolphinscheduler.plugin.task.api.AbstractYarnTask.handle(AbstractYarnTask.java:52)
at org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerDelayTaskExecuteRunnable.executeTask(DefaultWorkerDelayTaskExecuteRunnable.java:57)
at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable.run(WorkerTaskExecuteRunnable.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The driver pod does not exist.
at org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.lambda$collectPodLogIfNeeded$0(AbstractCommandExecutor.java:287)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 common frames omitted
Caused by: java.lang.RuntimeException: The driver pod does not exist.
at org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.lambda$collectPodLogIfNeeded$0(AbstractCommandExecutor.java:277)
... 7 common frames omitted
[INFO] 2023-12-21 11:00:27.572 +0800 - process has exited. execute path:/tmp/dolphinscheduler/exec/process/default/11930573660864/11985906373952_1/9/9, processId:14145 ,exitStatusCode:1 ,processWaitForStatus:true ,processExitValue:1
[INFO] 2023-12-21 11:00:27.572 +0800 - Start finding appId in /opt/dolphinscheduler/worker-server/logs/20231221/11985906373952/1/9/9.log, fetch way: log
[INFO] 2023-12-21 11:00:27.575 +0800 - ***********************************************************************************************
[INFO] 2023-12-21 11:00:27.575 +0800 - ********************************* Finalize task instance ************************************
[INFO] 2023-12-21 11:00:27.575 +0800 - ***********************************************************************************************
[INFO] 2023-12-21 11:00:27.578 +0800 - Upload output files: [] successfully
[INFO] 2023-12-21 11:00:27.579 +0800 - Send task execute status: FAILURE to master : 192.168.12.26:1234
[INFO] 2023-12-21 11:00:27.579 +0800 - Remove the current task execute context from worker cache
[INFO] 2023-12-21 11:00:27.579 +0800 - The current execute mode isn't develop mode, will clear the task execute file: /tmp/dolphinscheduler/exec/process/default/11930573660864/11985906373952_1/9/9
[INFO] 2023-12-21 11:00:27.786 +0800 - Success clear the task execute file: /tmp/dolphinscheduler/exec/process/default/11930573660864/11985906373952_1/9/9
[INFO] 2023-12-21 11:00:27.787 +0800 - FINALIZE_SESSION

How to reproduce

添加k8s进群到ds控制台,新建工作流,
image

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@wangchao732 wangchao732 added bug Something isn't working Waiting for reply Waiting for reply labels Dec 25, 2023
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

准备在ds中运行spark on k8s任务提示 Error: Master must start with yarn, spark, mesos, or local

What you expected to happen

export KUBECONFIG=/tmp/dolphinscheduler/exec/process/default/11930573660864/11985906373952_1/9/9/config
${SPARK_HOME}/bin/spark-submit --master k8s://https://192.168.11.107:6443/ --deploy-mode cluster --class uml.tech.spark.SparkApp --conf spark.driver.cores=1 --conf spark.driver.memory=512M --conf spark.executor.instances=2 --conf spark.executor.cores=2 --conf spark.executor.memory=2G --name TE --conf spark.kubernetes.driver.label.dolphinscheduler-label=9_9 --conf spark.kubernetes.namespace=spark-job file:/dolphinscheduler/default/resources/spark-job/uml-ne-gblogs-to-tdengine-1.0.0/lib/uml-ne-gblogs-to-tdengine-1.0.0.jar 20220311 200
[INFO] 2023-12-21 11:00:11.546 +0800 - Executing shell command : sudo -u dolphinscheduler -i /tmp/dolphinscheduler/exec/process/default/11930573660864/11985906373952_1/9/9/9_9.sh
[INFO] 2023-12-21 11:00:11.565 +0800 - process start, process id is: 14145
[INFO] 2023-12-21 11:00:12.565 +0800 - ->
WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark) overrides detected (/opt/cloudera/parcels/CDH/lib/spark).
WARNING: Running spark-class from user-defined location.
[INFO] 2023-12-21 11:00:14.567 +0800 - ->
Error: Master must start with yarn, spark, mesos, or local
Run with --help for usage help or --verbose for debug output
[ERROR] 2023-12-21 11:00:27.571 +0800 - Handle pod log error
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: The driver pod does not exist.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.run(AbstractCommandExecutor.java:211)
at org.apache.dolphinscheduler.plugin.task.api.AbstractYarnTask.handle(AbstractYarnTask.java:52)
at org.apache.dolphinscheduler.server.worker.runner.DefaultWorkerDelayTaskExecuteRunnable.executeTask(DefaultWorkerDelayTaskExecuteRunnable.java:57)
at org.apache.dolphinscheduler.server.worker.runner.WorkerTaskExecuteRunnable.run(WorkerTaskExecuteRunnable.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The driver pod does not exist.
at org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.lambda$collectPodLogIfNeeded$0(AbstractCommandExecutor.java:287)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 common frames omitted
Caused by: java.lang.RuntimeException: The driver pod does not exist.
at org.apache.dolphinscheduler.plugin.task.api.AbstractCommandExecutor.lambda$collectPodLogIfNeeded$0(AbstractCommandExecutor.java:277)
... 7 common frames omitted
[INFO] 2023-12-21 11:00:27.572 +0800 - process has exited. execute path:/tmp/dolphinscheduler/exec/process/default/11930573660864/11985906373952_1/9/9, processId:14145 ,exitStatusCode:1 ,processWaitForStatus:true ,processExitValue:1
[INFO] 2023-12-21 11:00:27.572 +0800 - Start finding appId in /opt/dolphinscheduler/worker-server/logs/20231221/11985906373952/1/9/9.log, fetch way: log
[INFO] 2023-12-21 11:00:27.575 +0800 - ***********************************************************************************************
[INFO] 2023-12-21 11:00:27.575 +0800 - ********************************* Finalize task instance ************************************
[INFO] 2023-12-21 11:00:27.575 +0800 - ***********************************************************************************************
[INFO] 2023-12-21 11:00:27.578 +0800 - Upload output files: [] successfully
[INFO] 2023-12-21 11:00:27.579 +0800 - Send task execute status: FAILURE to master : 192.168.12.26:1234
[INFO] 2023-12-21 11:00:27.579 +0800 - Remove the current task execute context from worker cache
[INFO] 2023-12-21 11:00:27.579 +0800 - The current execute mode isn't develop mode, will clear the task execute file: /tmp/dolphinscheduler/exec/process/default/11930573660864/11985906373952_1/9/9
[INFO] 2023-12-21 11:00:27.786 +0800 - Success clear the task execute file: /tmp/dolphinscheduler/exec/process/default/11930573660864/11985906373952_1/9/9
[INFO] 2023-12-21 11:00:27.787 +0800 - FINALIZE_SESSION

How to reproduce

添加k8s进群到ds控制台,新建工作流,
image

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Radeity
Copy link
Member

Radeity commented Dec 27, 2023

It seems like your Spark version doesn't support run on K8S.

@Radeity Radeity removed the Waiting for reply Waiting for reply label Dec 27, 2023
Copy link

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Jan 27, 2024
@zhongjiajie zhongjiajie added this to the 3.2.1 milestone Jan 30, 2024
Copy link

github-actions bot commented Feb 7, 2024

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

@github-actions github-actions bot closed this as completed Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

3 participants