Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[worker] fix kill remote shell task and print exception log #15569 #15628

Closed
wants to merge 7 commits into from

Conversation

alei1206
Copy link
Contributor

Purpose of the pull request

This pull request to fix #15569 , not print exception log when kill a remote shell task.

  1. Like shell task, remote-sehll task is no need to set appIds in TaskExecutionContext. when we set it in remote-sehll task, an error logic will be triggered in org.apache.dolphinscheduler.plugin.task.api.utils.ProcessUtils#cancelApplication
  2. We don't need cleanData(taskId) in org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor#kill(). Because cleanData(taskId) in org.apache.dolphinscheduler.plugin.task.remoteshell.RemoteExecutor#getTaskExitCode().

Brief change log

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(or)

If your pull request contain incompatible change, you should also add it to docs/docs/en/guide/upgrede/incompatible.md

@caishunfeng
Copy link
Contributor

@@ -156,7 +156,6 @@ public void kill(String taskId) throws IOException {
String pid = getTaskPid(taskId);
String killCommand = String.format(COMMAND.KILL_COMMAND, pid);
runRemote(killCommand);
cleanData(taskId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain more why we need to delete cleanData() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rickchengx Hi, I read the logic again, we can not delete cleanData() here. I think, The rm command in cleanData() should be replaced to rm -f to avoid the printing of the error log when the file is not existed. what you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rickchengx Hi, I read the logic again, we can not delete cleanData() here. I think, The rm command in cleanData() should be replaced to rm -f to avoid the printing of the error log when the file is not existed. what you think?

I think we should make sure the remote task can be killed successfully. And There can be various reasons for errors when cleaning data, and I think it's acceptable to make users aware of these errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rickchengx Hi, I read the logic again, we can not delete cleanData() here. I think, The rm command in cleanData() should be replaced to rm -f to avoid the printing of the error log when the file is not existed. what you think?

I think we should make sure the remote task can be killed successfully. And There can be various reasons for errors when cleaning data, and I think it's acceptable to make users aware of these errors.

sure,I will revert to this logic later。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rickchengx Hi, I read the logic again, we can not delete cleanData() here. I think, The rm command in cleanData() should be replaced to rm -f to avoid the printing of the error log when the file is not existed. what you think?

I think we should make sure the remote task can be killed successfully. And There can be various reasons for errors when cleaning data, and I think it's acceptable to make users aware of these errors.

Hi, I reverted cleanData() logic.

@codecov-commenter
Copy link

codecov-commenter commented Mar 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 38.96%. Comparing base (c61b81e) to head (33bdf33).

❗ Current head 33bdf33 differs from pull request most recent head 2f14950. Consider uploading reports for the commit 2f14950 to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##                dev   #15628      +/-   ##
============================================
- Coverage     38.96%   38.96%   -0.01%     
+ Complexity     4840     4839       -1     
============================================
  Files          1316     1316              
  Lines         45010    45010              
  Branches       4818     4818              
============================================
- Hits          17539    17538       -1     
  Misses        25575    25575              
- Partials       1896     1897       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SbloodyS SbloodyS added the 3.2.2 label Mar 10, 2024
@SbloodyS SbloodyS added this to the 3.2.2 milestone Mar 10, 2024
Copy link

sonarcloud bot commented Mar 10, 2024

Please retry analysis of this Pull-Request directly on SonarCloud

@@ -90,7 +90,6 @@ public void init() {
taskId = TASK_ID_PREFIX + taskExecutionContext.getTaskInstanceId();
}
setAppIds(taskId);
taskExecutionContext.setAppIds(taskId);
Copy link
Contributor

@rickchengx rickchengx Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's appropriate to delete the appId directly here. I suggest modifying the logic of ProcessUtils. Currently, it seems that the appId is considered to be only yarn app id, but in fact there are other external application ids, such as remote shell.

cc @Radeity @caishunfeng @SbloodyS

if (CollectionUtils.isEmpty(appIds)) {
log.info("The appId is empty");
return;
}
ApplicationManager applicationManager = applicationManagerMap.get(ResourceManagerType.YARN);
applicationManager.killApplication(new YarnApplicationManagerContext(executePath, tenantCode, appIds));
}

Copy link

This pull request has been automatically marked as stale because it has not had recent activity for 120 days. It will be closed in 7 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Jul 11, 2024
Copy link

This pull request has been closed because it has not had recent activity. You could reopen it if you try to continue your work, and anyone who are interested in it are encouraged to continue work on this pull request.

@github-actions github-actions bot closed this Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.2.2 backend bug Something isn't working Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [Worker] Kill a remote-shell task, dolphin worker server will print error log
6 participants