Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #5795][K8S] Support to cleanup the spark driver pod periodically #5806

Closed
wants to merge 1 commit into from

Conversation

liaoyt
Copy link
Contributor

@liaoyt liaoyt commented Dec 2, 2023

🔍 Description

Issue References 🔗

This pull request fixes #5795

Describe Your Solution 🔧

Create a single daemon thread to traverse cache map periodically, which will evict expired cache and trigger a pod clean up operation.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests


Checklists

📝 Author Self Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • This patch was not authored or co-authored using Generative Tooling

📝 Committer Pre-Merge Checklist

  • Pull request title is okay.
  • No license issues.
  • Milestone correctly set?
  • Test coverage is ok
  • Assignees are selected.
  • Minimum number of approvals
  • No changes are requested

Be nice. Be informative.

@liaoyt liaoyt changed the title [KYUUBI #5731][K8S] Support to cleanup the spark driver pod periodically [KYUUBI #5795][K8S] Support to cleanup the spark driver pod periodically Dec 2, 2023
@@ -147,6 +150,26 @@ class KubernetesApplicationOperation extends ApplicationOperation with Logging {
}
})
.build()
if (conf.get(KyuubiConf.KUBERNETES_SPARK_CLEANUP_TERMINATED_DRIVE_POD_IMMEDIATELY)) {
expireCleanUpTriggerCacheExecutor = Executors.newSingleThreadScheduledExecutor(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ThreadUtils to create the executor

@@ -1241,6 +1241,15 @@ object KyuubiConf {
.stringConf
.createWithDefault(KubernetesCleanupDriverPodStrategy.NONE.toString)

val KUBERNETES_SPARK_CLEANUP_TERMINATED_DRIVE_POD_IMMEDIATELY: ConfigEntry[Boolean] =
buildConf("kyuubi.kubernetes.spark.cleanupTerminatedDriverPodImmediately")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for making you back and forth, I think the configuration name is confusing now. how about?

kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind=[NONE|ALL|COMPLETED]
kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.checkInterval=[PT1M]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Comment on lines 160 to 164
val keys = cleanupTerminatedAppInfoTrigger.asMap().keySet().asScala
for (key <- keys) {
// do get to trigger cache eviction
cleanupTerminatedAppInfoTrigger.getIfPresent(key)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val keys = cleanupTerminatedAppInfoTrigger.asMap().keySet().asScala
for (key <- keys) {
// do get to trigger cache eviction
cleanupTerminatedAppInfoTrigger.getIfPresent(key)
}
val keys = cleanupTerminatedAppInfoTrigger.asMap().asScala.foreach { case (key, _) =>
// do get to trigger cache eviction
cleanupTerminatedAppInfoTrigger.getIfPresent(key)
}

expireCleanUpTriggerCacheExecutor = Executors.newSingleThreadScheduledExecutor(
new ThreadFactoryBuilder().setDaemon(true).setNameFormat(
s"pod-clean-up-trigger-thread").build())
expireCleanUpTriggerCacheExecutor.scheduleAtFixedRate(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to use scheduleAtFixedDelay if possible

I have encountered a performance issue when the task execution time is longer than the scheduling interval, which causes high CPU usage.

@codecov-commenter
Copy link

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (5b3a78d) 61.27% compared to head (75c2b68) 61.25%.
Report is 2 commits behind head on master.

Files Patch % Lines
...kyuubi/engine/KubernetesApplicationOperation.scala 75.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #5806      +/-   ##
============================================
- Coverage     61.27%   61.25%   -0.03%     
  Complexity       23       23              
============================================
  Files           608      608              
  Lines         36027    36047      +20     
  Branches       4951     4951              
============================================
+ Hits          22076    22080       +4     
- Misses        11556    11573      +17     
+ Partials       2395     2394       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pan3793 pan3793 added this to the v1.8.1 milestone Dec 7, 2023
@pan3793
Copy link
Member

pan3793 commented Dec 7, 2023

Merging to master/1.8

@pan3793 pan3793 closed this in 27ad102 Dec 7, 2023
pan3793 pushed a commit that referenced this pull request Dec 7, 2023
# 🔍 Description
## Issue References 🔗

This pull request fixes #5795

## Describe Your Solution 🔧

Create a single daemon thread to traverse cache map periodically, which will evict expired cache and trigger a pod clean up operation.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [ ] I have performed a self-review
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [x] Pull request title is okay.
- [x] No license issues.
- [x] Milestone correctly set?
- [ ] Test coverage is ok
- [x] Assignees are selected.
- [x] Minimum number of approvals
- [x] No changes are requested

**Be nice. Be informative.**

Closes #5806 from liaoyt/master.

Closes #5795

75c2b68 [yeatsliao] cleanup driver pod periodically

Authored-by: yeatsliao <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
(cherry picked from commit 27ad102)
Signed-off-by: Cheng Pan <[email protected]>
@sudohainguyen
Copy link
Contributor

sudohainguyen commented Apr 16, 2024

I'm using v1.9 but this change seems not working 🤔
is there any logs to check if the daemon is still active? @liaoyt

@liaoyt
Copy link
Contributor Author

liaoyt commented Apr 22, 2024

I'm using v1.9 but this change seems not working 🤔 is there any logs to check if the daemon is still active? @liaoyt

I'm sorry currently there is not any logs to check if the deamon is still active.
but you can try check if there is a thread named 'pod-cleanup-trigger-thread'

@sudohainguyen
Copy link
Contributor

sudohainguyen commented Apr 22, 2024

just checked, there's no thread named "pod-cleanup-trigger-thread"
found it

@sudohainguyen
Copy link
Contributor

but still doesn't delete pods

@A-little-bit-of-data
Copy link

pod-cleanup-trigger-thread

I use the v1.9.1 version and configure the following two parameters kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind=ALL
kyuubi.kubernetes.terminatedApplicationRetainPeriod=PT1M
After the executor is deleted after the timeout, the driver pod is in the completed state, but it is still retained and needs to be manually deleted. I also checked the pod log, there's no thread named "pod-cleanup-trigger-thread" @liaoyt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TASK][EASY] Add a scheduled thread to trigger k8s spark driver pod clean up periodically
5 participants