diff --git a/docs/installation.md b/docs/installation.md index 07ed87cafb..12b297200e 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -23,7 +23,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Install the latest version. -$ helm install kruise openkruise/kruise --version 1.7.1 +$ helm install kruise openkruise/kruise --version 1.7.2 ``` **Note:** [Changelog](https://github.com/openkruise/kruise/blob/master/CHANGELOG.md). @@ -37,7 +37,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Upgrade to the latest version. -$ helm upgrade kruise openkruise/kruise --version 1.7.1 [--force] +$ helm upgrade kruise openkruise/kruise --version 1.7.2 [--force] ``` Note that: @@ -83,7 +83,7 @@ The following table lists the configurable parameters of the chart and their def | `manager.log.level` | Log level that kruise-manager printed | `4` | | `manager.replicas` | Replicas of kruise-controller-manager deployment | `2` | | `manager.image.repository` | Repository for kruise-manager image | `openkruise/kruise-manager` | -| `manager.image.tag` | Tag for kruise-manager image | `v1.7.1` | +| `manager.image.tag` | Tag for kruise-manager image | `v1.7.2` | | `manager.resources.limits.cpu` | CPU resource limit of kruise-manager container | `200m` | | `manager.resources.limits.memory` | Memory resource limit of kruise-manager container | `512Mi` | | `manager.resources.requests.cpu` | CPU resource request of kruise-manager container | `100m` | diff --git a/docs/user-manuals/advancedstatefulset.md b/docs/user-manuals/advancedstatefulset.md index d6ba530237..9ed3157b85 100644 --- a/docs/user-manuals/advancedstatefulset.md +++ b/docs/user-manuals/advancedstatefulset.md @@ -76,10 +76,10 @@ spec: image: nginx:alpine ``` -### User Stories +#### User Stories The main motivation of this feature is to support a more flexible StatefulSet, a building block in an ecosystem where Stateful applications can be migrated across Kubernetes clusters with more automation. As follows: -#### Story 1 +##### Story 1 **Migrating across namespaces**: Many organizations use namespaces for team isolation. Consider a team that is migrating a `StatefulSet` to a new namespace in a cluster. Migration could be motivated by a branding change, or a requirement to move out of a shared namespace. Consider the StatefulSet `my-app` with `replicas: 5`, running in a shared namespace. @@ -108,15 +108,48 @@ ordinals.start: 0 ordinals.start: 3 The `replicasStatefulSet` and `replicas` fields should be updated jointly, depending on the requirements of the migration. -#### Story 2 +##### Story 2 **Migrating across clusters**: Organizations taking a multi cluster approach may need to move workloads across clusters due to capacity constraints, infrastructure constraints, or for better application isolation. Similar to namespace migration, the application operator should manage network connectivity, volumes and slice orchestration. -#### Story 3 +##### Story 3 **Non-Zero Based Indexing:** A user may want to number their StatefulSet starting from ordinal `1`, rather than ordinal `0`. Using `1` based numbering may be easier to reason about and conceptualize (eg: ordinal `k` is the `k`'th replica, not the `k+1`'th replica). +## Scale features + +### PersistentVolumeClaim retention + +**FEATURE STATE:** Kruise v1.1.0 + +If you have enabled the `StatefulSetAutoDeletePVC` feature-gate during [Kruise installation or upgrade](../installation#optional-feature-gate), +you can use `.spec.persistentVolumeClaimRetentionPolicy` field to control if and how PVCs are deleted during the lifecycle of a StatefulSet. + +This is same to the upstream StatefulSet (K8s >= 1.23 [alpha]), please refer to [the upstream document for it](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention). + +### Scaling with rate limiting + +**FEATURE STATE:** Kruise v0.10.0 + +To avoid creating all failure pods at once when a new CloneSet applied, a `maxUnavailable` field for scale strategy has been added since Kruise `v0.10.0`. + +```yaml +apiVersion: apps.kruise.io/v1beta1 +kind: StatefulSet +spec: + # ... + replicas: 100 + scaleStrategy: + maxUnavailable: 10% # percentage or absolute number +``` + +When this field has been set, Advanced StatefulSet will create pods with the guarantee that the number of unavailable pods during the update cannot exceed this value. + +For example, the StatefulSet will firstly create 10 pods. After that, it will create one more pod only if one pod created has been running and ready. + +Note that it can just be allowed to work with Parallel podManagementPolicy. + ### Ordinals reserve(skip) Since Advanced StatefulSet `v1beta1` (Kruise >= v0.7.0), it supports ordinals reserve. @@ -138,45 +171,35 @@ spec: For an Advanced StatefulSet with `replicas=4, reserveOrdinals=[1]`, the ordinals of running Pods will be `[0,2,3,4]`. - If you want to migrate Pod-3 and reserve this ordinal, just append `3` into `reserveOrdinals` list. -Then controller will delete Pod-3 and create Pod-5 (existing Pods will be `[0,2,4,5]`). + Then controller will delete Pod-3 and create Pod-5 (existing Pods will be `[0,2,4,5]`). - If you just want to delete Pod-3, you should append `3` into `reserveOrdinals` list and set `replicas` to `3`. -Then controller will delete Pod-3 (existing Pods will be `[0,2,4]`). + Then controller will delete Pod-3 (existing Pods will be `[0,2,4]`). -## MaxUnavailable +### Specified Pod Deletion -Advanced StatefulSet adds a `maxUnavailable` capability in the `RollingUpdateStatefulSetStrategy` to allow parallel Pod -updates with the guarantee that the number of unavailable pods during the update cannot exceed this value. -It is only allowed to use when the podManagementPolicy is `Parallel`. +**FEATURE STATE:** Kruise v1.5.5, Kruise v1.6.4, Kruise v1.7.2+ -This feature achieves similar update efficiency like Deployment for cases where the order of -update is not critical to the workload. Without this feature, the native `StatefulSet` controller can only -update Pods one by one even if the podManagementPolicy is `Parallel`. +Compared to manually deleting a Pod directly, pod deletion by labeling pod with `apps.kruise.io/specified-delete: true` will be protected by the `maxUnavailable` of the Advanced StatefulSet during deletion, +and it will trigger the `PreparingDelete` lifecycle hook (see below). ```yaml -apiVersion: apps.kruise.io/v1beta1 -kind: StatefulSet +apiVersion: v1 +kind: Pod +metadata: + labels: + # ... + apps.kruise.io/specified-delete: true spec: + containers: + - name: main # ... - podManagementPolicy: Parallel - updateStrategy: - type: RollingUpdate - rollingUpdate: - maxUnavailable: 20% ``` -For example, assuming an Advanced StatefulSet has five Pods named P0 to P4, and the application can -tolerate losing three replicas temporally. If we want to update the StatefulSet Pod spec from v1 to -v2, we can perform the following steps using the `MaxUnavailable` feature for fast update. - -1. Set `MaxUnavailable` to 3 to allow three unavailable Pods maximally. -2. Optionally, Set `Partition` to 4 in case canary update is needed. Partition means all Pods with an ordinal that is - greater than or equal to the partition will be updated. In this case P4 will be updated even though `MaxUnavailable` - is 3. -3. After P4 finish update, change `Partition` to 0. The controller will update P1,P2 and P3 concurrently. - Note that with default StatefulSet, the Pods will be updated sequentially in the order of P3, P2, P1. -4. Once one of P1, P2 and P3 finishes update, P0 will be updated immediately. +When the controller receives the above Pod update, it will trigger the deletion process of the pod with specified deletion label and ensure that the `maxUnavailable` limit is not exceeded. +The pod will be re-built by the workload if the ordinal is not reserved. -## In-Place Update +## Update features +### In-Place Update Advanced StatefulSet adds a `podUpdatePolicy` field in `spec.updateStrategy.rollingUpdate` which controls recreate or in-place update for Pods. @@ -244,14 +267,38 @@ spec: maxUnavailable: 2 ``` -## Update sequence +### Pre-download image for in-place update + +**FEATURE STATE:** Kruise v0.10.0 + +If you have enabled the `PreDownloadImageForInPlaceUpdate` feature-gate during [Kruise installation or upgrade](../installation#optional-feature-gate), +Advanced StatefulSet controller will automatically pre-download the image you want to update to the nodes of all old Pods. +It is quite useful to accelerate the progress of applications upgrade. + +The parallelism of each new image pre-downloading by Advanced StatefulSet is `1`, which means the image is downloaded on nodes one by one. +You can change the parallelism using `apps.kruise.io/image-predownload-parallelism` annotation on Advanced StatefulSet according to the capability of image registry, +for registries with more bandwidth and P2P image downloading ability, a larger parallelism can speed up the pre-download process. + +Since Kruise v1.1.0, you can use `apps.kruise.io/image-predownload-min-updated-ready-pods` to make sure the new image starting pre-download after a few new Pods have been updated ready. Its value can be absolute number or percentage. + +```yaml +apiVersion: apps.kruise.io/v1beta1 +kind: StatefulSet +metadata: + annotations: + apps.kruise.io/image-predownload-parallelism: "10" + apps.kruise.io/image-predownload-min-updated-ready-pods: "3" +``` + +Note that to avoid most unnecessary image downloading, now controller will only pre-download images for Advanced StatefulSet with replicas > `3`. + +### Update sequence Advanced StatefulSet adds a `unorderedUpdate` field in `spec.updateStrategy.rollingUpdate`, which contains strategies for non-ordered update. If `unorderedUpdate` is not nil, pods will be updated with non-ordered sequence. Noted that UnorderedUpdate can only be allowed to work with Parallel podManagementPolicy. Currently `unorderedUpdate` only contains one field: `priorityStrategy`. - -### Priority strategy +#### Priority strategy This strategy defines rules for calculating the priority of updating pods. All update candidates will be applied with the priority terms. @@ -291,79 +338,57 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` -## Paused update +### MaxUnavailable -`paused` indicates that Pods updating is paused, controller will not update Pods but just maintain the number of replicas. +Advanced StatefulSet adds a `maxUnavailable` capability in the `RollingUpdateStatefulSetStrategy` to allow parallel Pod +updates with the guarantee that the number of unavailable pods during the update cannot exceed this value. +It is only allowed to use when the podManagementPolicy is `Parallel`. + +This feature achieves similar update efficiency like Deployment for cases where the order of +update is not critical to the workload. Without this feature, the native `StatefulSet` controller can only +update Pods one by one even if the podManagementPolicy is `Parallel`. ```yaml apiVersion: apps.kruise.io/v1beta1 kind: StatefulSet spec: # ... + podManagementPolicy: Parallel updateStrategy: + type: RollingUpdate rollingUpdate: - paused: true -``` - -## Pre-download image for in-place update - -**FEATURE STATE:** Kruise v0.10.0 - -If you have enabled the `PreDownloadImageForInPlaceUpdate` feature-gate during [Kruise installation or upgrade](../installation#optional-feature-gate), -Advanced StatefulSet controller will automatically pre-download the image you want to update to the nodes of all old Pods. -It is quite useful to accelerate the progress of applications upgrade. - -The parallelism of each new image pre-downloading by Advanced StatefulSet is `1`, which means the image is downloaded on nodes one by one. -You can change the parallelism using `apps.kruise.io/image-predownload-parallelism` annotation on Advanced StatefulSet according to the capability of image registry, -for registries with more bandwidth and P2P image downloading ability, a larger parallelism can speed up the pre-download process. - -Since Kruise v1.1.0, you can use `apps.kruise.io/image-predownload-min-updated-ready-pods` to make sure the new image starting pre-download after a few new Pods have been updated ready. Its value can be absolute number or percentage. - -```yaml -apiVersion: apps.kruise.io/v1beta1 -kind: StatefulSet -metadata: - annotations: - apps.kruise.io/image-predownload-parallelism: "10" - apps.kruise.io/image-predownload-min-updated-ready-pods: "3" + maxUnavailable: 20% ``` -Note that to avoid most unnecessary image downloading, now controller will only pre-download images for Advanced StatefulSet with replicas > `3`. +For example, assuming an Advanced StatefulSet has five Pods named P0 to P4, and the application can +tolerate losing three replicas temporally. If we want to update the StatefulSet Pod spec from v1 to +v2, we can perform the following steps using the `MaxUnavailable` feature for fast update. -## Scaling with rate limiting +1. Set `MaxUnavailable` to 3 to allow three unavailable Pods maximally. +2. Optionally, Set `Partition` to 4 in case canary update is needed. Partition means all Pods with an ordinal that is + greater than or equal to the partition will be updated. In this case P4 will be updated even though `MaxUnavailable` + is 3. +3. After P4 finish update, change `Partition` to 0. The controller will update P1,P2 and P3 concurrently. + Note that with default StatefulSet, the Pods will be updated sequentially in the order of P3, P2, P1. +4. Once one of P1, P2 and P3 finishes update, P0 will be updated immediately. -**FEATURE STATE:** Kruise v0.10.0 +### Paused update -To avoid creating all failure pods at once when a new CloneSet applied, a `maxUnavailable` field for scale strategy has been added since Kruise `v0.10.0`. +`paused` indicates that Pods updating is paused, controller will not update Pods but just maintain the number of replicas. ```yaml apiVersion: apps.kruise.io/v1beta1 kind: StatefulSet spec: # ... - replicas: 100 - scaleStrategy: - maxUnavailable: 10% # percentage or absolute number + updateStrategy: + rollingUpdate: + paused: true ``` -When this field has been set, Advanced StatefulSet will create pods with the guarantee that the number of unavailable pods during the update cannot exceed this value. - -For example, the StatefulSet will firstly create 10 pods. After that, it will create one more pod only if one pod created has been running and ready. - -Note that it can just be allowed to work with Parallel podManagementPolicy. - -## PersistentVolumeClaim retention - -**FEATURE STATE:** Kruise v1.1.0 - -If you have enabled the `StatefulSetAutoDeletePVC` feature-gate during [Kruise installation or upgrade](../installation#optional-feature-gate), -you can use `.spec.persistentVolumeClaimRetentionPolicy` field to control if and how PVCs are deleted during the lifecycle of a StatefulSet. - -This is same to the upstream StatefulSet (K8s >= 1.23 [alpha]), please refer to [the upstream document for it](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention). - ## Lifecycle hook **FEATURE STATE:** Kruise v0.8.0 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation.md index a19063f924..515d325436 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/installation.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation.md @@ -23,7 +23,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Install the latest version. -$ helm install kruise openkruise/kruise --version 1.7.1 +$ helm install kruise openkruise/kruise --version 1.7.2 ``` **注意:** [Changelog](https://github.com/openkruise/kruise/blob/master/CHANGELOG.md)。 ## 通过 helm 升级 @@ -36,7 +36,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Upgrade to the latest version. -$ helm upgrade kruise openkruise/kruise --version 1.7.1 [--force] +$ helm upgrade kruise openkruise/kruise --version 1.7.2 [--force] ``` 注意: @@ -72,24 +72,24 @@ $ helm install/upgrade kruise /PATH/TO/CHART | `imagePullSecrets` | kruise 镜像用的 imagePullSecrets 列表 | `false` | #### manager参数 -| Parameter | Description | Default | -| ----------------------------------------- | ------------------------------------------------------------ | ----------------------------- | -| `manager.log.level` | kruise-manager 日志输出级别 | `4` | -| `manager.replicas` | kruise-manager 的期望副本数 | `2` | -| `manager.image.repository` | kruise-manager/kruise-daemon 镜像仓库 | `openkruise/kruise-manager` | -| `manager.image.tag` | kruise-manager/kruise-daemon 镜像版本 | `1.7.1` | -| `manager.resources.limits.cpu` | kruise-manager 的 limit CPU 资源 | `200m` | -| `manager.resources.limits.memory` | kruise-manager 的 limit memory 资源 | `512Mi` | -| `manager.resources.requests.cpu` | kruise-manager 的 request CPU 资源 | `100m` | -| `manager.resources.requests.memory` | kruise-manager 的 request memory 资源 | `256Mi` | -| `manager.metrics.port` | metrics 服务的监听端口 | `8080` | -| `manager.webhook.port` | webhook 服务的监听端口 | `9443` | -| `manager.nodeAffinity` | kruise-manager 部署的 node affinity 亲和性 | `{}` | -| `manager.nodeSelector` | kruise-manager 部署的 node selector 亲和性 | `{}` | -| `manager.tolerations` | kruise-manager 部署的 tolerations | `[]` | -| `manager.resyncPeriod` | kruise-manager 中 informer 的 resync 周期,默认不做 resync | `0` | -| `manager.hostNetwork` | kruise-manager pod 是否采用 hostnetwork 网络 | `false` | -| `manager.loggingFormat` | 结构化日志,有效的format包括:` `(plain text)、`json` | ` ` | +| Parameter | Description | Default | +| ----------------------------------------- | ------------------------------------------------------------ |-----------------------------| +| `manager.log.level` | kruise-manager 日志输出级别 | `4` | +| `manager.replicas` | kruise-manager 的期望副本数 | `2` | +| `manager.image.repository` | kruise-manager/kruise-daemon 镜像仓库 | `openkruise/kruise-manager` | +| `manager.image.tag` | kruise-manager/kruise-daemon 镜像版本 | `1.7.2` | +| `manager.resources.limits.cpu` | kruise-manager 的 limit CPU 资源 | `200m` | +| `manager.resources.limits.memory` | kruise-manager 的 limit memory 资源 | `512Mi` | +| `manager.resources.requests.cpu` | kruise-manager 的 request CPU 资源 | `100m` | +| `manager.resources.requests.memory` | kruise-manager 的 request memory 资源 | `256Mi` | +| `manager.metrics.port` | metrics 服务的监听端口 | `8080` | +| `manager.webhook.port` | webhook 服务的监听端口 | `9443` | +| `manager.nodeAffinity` | kruise-manager 部署的 node affinity 亲和性 | `{}` | +| `manager.nodeSelector` | kruise-manager 部署的 node selector 亲和性 | `{}` | +| `manager.tolerations` | kruise-manager 部署的 tolerations | `[]` | +| `manager.resyncPeriod` | kruise-manager 中 informer 的 resync 周期,默认不做 resync | `0` | +| `manager.hostNetwork` | kruise-manager pod 是否采用 hostnetwork 网络 | `false` | +| `manager.loggingFormat` | 结构化日志,有效的format包括:` `(plain text)、`json` | ` ` | #### daemon参数 | Parameter | Description | Default | diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-manuals/advancedstatefulset.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-manuals/advancedstatefulset.md index e89a0d255b..28a27953a7 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/user-manuals/advancedstatefulset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-manuals/advancedstatefulset.md @@ -16,11 +16,11 @@ title: Advanced StatefulSet ```yaml - apiVersion: apps/v1 + apiVersion: apps.kruise.io/v1beta1 - kind: StatefulSet - metadata: - name: sample - spec: - #... + kind: StatefulSet + metadata: + name: sample + spec: + #... ``` 注意从 Kruise 0.7.0 开始,Advanced StatefulSet 版本升级到了 `v1beta1`,并与 `v1alpha1` 兼容。对于低于 v0.7.0 版本的 Kruise,只能使用 `v1alpha1`。 @@ -49,7 +49,7 @@ metadata: Pod 起始序号默认都是从 0 开始的,此外,你也可以通过设置 **.spec.ordinals.start** 字段来设置 Pod 起始序号。使用该能力,你需要开启 FeatureGate **StatefulSetStartOrdinal=true**。 - spec.ordinals.start:如果 .spec.ordinals.start 字段被设置,则 Pod 将被分配从 .spec.ordinals.start 到 .spec.ordinals.start + .spec.replicas - 1 的序号。 -比如:replicas=5、ordinals.start=3,Pod 序号 = [3, 7]。 + 比如:replicas=5、ordinals.start=3,Pod 序号 = [3, 7]。 ``` apiVersion: apps.kruise.io/v1beta1 @@ -74,11 +74,11 @@ spec: image: nginx:alpine ``` -### User Stories +#### User Stories 起始序号能力主要是为了使 StatefulSet 更加灵活,基于该能力有状态应用可以自动化的方式在 Kubernetes 集群间迁移。如下: -#### Story 1 +##### Story 1 **Migrating across namespaces**: 许多公司使用命名空间进行隔离,考虑到用户正在将 StatefulSet 迁移到集群中的新命名空间。 迁移的原因可能是组织架构变动,也可能是要求迁出共享命名空间。如下,有 **replicas:5** 在共享命名空间中运行: @@ -102,14 +102,47 @@ ordinals.start: 0 ordinals.start: 3 [ nginx-0, nginx-1, nginx-2 ] [ nginx-3, nginx-4 ] ``` -#### Story 2 +##### Story 2 **Migrating across clusters**: 由于容量限制、基础设施限制或为了更好地隔离应用程序,采用多集群的方式可能需要在集群间移动工作负载。 -#### Story 3 +##### Story 3 **Non-Zero Based Indexing:** 用户可能希望从序号 “1 ”而不是序号 “0 ” 开始对其 StatefulSet 进行编号。使用 ”1 “ 的编号可能更容易推理和概念化(例如:序号 ”k “ 是第 ”k “ 个副本,而不是第 ”k+1 “ 个副本)。 +## 扩缩容功能 + +### PersistentVolumeClaim 保留 + +**FEATURE STATE:** Kruise v1.1.0 + +如果你在[安装或升级 Kruise](../installation##optional-feature-gate) 的时候启用了 `StatefulSetAutoDeletePVC` feature-gate, +你可以使用 `.spec.persistentVolumeClaimRetentionPolicy` 字段来控制在StatefulSet生命周期中是否以及何时删除它所创建的PVC。 + +这个功能与上游 StatefulSet (K8s >= 1.23 [alpha]) 提供的相同,可以参考[上游文档](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention)。 + +### 流式扩容 + +**FEATURE STATE:** Kruise v0.10.0 + +为了避免在一个新 Advanced StatefulSet 创建后有大量失败的 pod 被创建出来,从 Kruise `v0.10.0` 版本开始引入了在 scale strategy 中的 `maxUnavailable` 策略。 + +```yaml +apiVersion: apps.kruise.io/v1beta1 +kind: StatefulSet +spec: + # ... + replicas: 100 + scaleStrategy: + maxUnavailable: 10% # percentage or absolute number +``` + +当这个字段被设置之后,Advanced StatefulSet 会保证创建 pod 之后不可用 pod 数量不超过这个限制值。 + +比如说,上面这个 StatefulSet 一开始只会一次性创建 10 个 pod。在此之后,每当一个 pod 变为 running、ready 状态后,才会再创建一个新 pod 出来。 + +注意,这个功能只允许在 podManagementPolicy 是 `Parallel` 的 StatefulSet 中使用。 + ### 序号保留(跳过) 从 Advanced StatefulSet 的 v1beta1 版本开始(Kruise >= v0.7.0),支持序号保留功能。 @@ -132,34 +165,30 @@ spec: - 如果要把 Pod-3 做迁移并保留序号,则把 `3` 追加到 `reserveOrdinals` 列表中。控制器会把 Pod-3 删除并创建 Pod-5(此时运行中 Pod 为 `[0,2,4,5]`)。 - 如果只想删除 Pod-3,则把 `3` 追加到 `reserveOrdinals` 列表并同时把 `replicas` 减一修改为 `3`。控制器会把 Pod-3 删除(此时运行中 Pod 为 `[0,2,4]`)。 -## MaxUnavailable 最大不可用 +### 指定 Pod 删除 -Advanced StatefulSet 在 `RollingUpdateStatefulSetStrategy` 中新增了 `maxUnavailable` 策略来支持并行 Pod 发布,它会保证发布过程中最多有多少个 Pod 处于不可用状态。注意,`maxUnavailable` 只能配合 podManagementPolicy 为 `Parallel` 来使用。 +**FEATURE STATE:** Kruise v1.5.5, Kruise v1.6.4, Kruise v1.7.2+ -这个策略的效果和 `Deployment` 中的类似,但是可能会导致发布过程中的 order 顺序不能严格保证。 -如果不配置 `maxUnavailable`,它的默认值为 1,也就是和原生 `StatefulSet` 一样只能 one by one 串行发布 Pod,即使把 podManagementPolicy 配置为 `Parallel` 也是这样。 +相比于手动直接删除 Pod,使用 `apps.kruise.io/specified-delete: true` 指定 Pod 删除方式会有 Advanced StatefulSet 的 `maxUnavailable` 来保护删除, 并且会触发 `PreparingDelete` 生命周期 hook (见下文)。 ```yaml -apiVersion: apps.kruise.io/v1beta1 -kind: StatefulSet +apiVersion: v1 +kind: Pod +metadata: + labels: + # ... + apps.kruise.io/specified-delete: true spec: + containers: + - name: main # ... - podManagementPolicy: Parallel - updateStrategy: - type: RollingUpdate - rollingUpdate: - maxUnavailable: 20% ``` -比如说,一个 Advanced StatefulSet 下面有 P0 到 P4 五个 Pod,并且应用能容忍 3 个副本不可用。 -当我们把 StatefulSet 里的 Pod 升级版本的时候,可以通过以下步骤来做: +当控制器收到上面这个 Pod 更新之后,会优先处理存在指定删除标签的 pod 的删除流程,并保证不突破 `maxUnavailable` 的限制。 -1. 设置 `maxUnavailable=3` -2. (可选) 如果需要灰度升级,设置 `partition=4`。Partition 默认的意思是 order 大于等于这个数值的 Pod 才会更新,在这里就只会更新 P4,即使我们设置了 `maxUnavailable=3`。 -3. 在 P4 升级完成后,把 `partition` 调整为 0。此时,控制器会同时升级 P1、P2、P3 三个 Pod。注意,如果是原生 `StatefulSet`,只能串行升级 P3、P2、P1。 -4. 一旦这三个 Pod 中有一个升级完成了,控制器会立即开始升级 P0。 +## 升级功能 -## 原地升级 +### 原地升级 Advanced StatefulSet 增加了 `podUpdatePolicy` 来允许用户指定重建升级还是原地升级。 @@ -222,14 +251,37 @@ spec: maxUnavailable: 2 ``` -## 升级顺序 +### 原地升级自动预热 + +**FEATURE STATE:** Kruise v0.10.0 + +如果你在[安装或升级 Kruise](../installation##optional-feature-gate) 的时候启用了 `PreDownloadImageForInPlaceUpdate` feature-gate, +Advanced StatefulSet 控制器会自动在所有旧版本 pod 所在 node 节点上预热你正在灰度发布的新版本镜像。 这对于应用发布加速很有帮助。 + +默认情况下 Advanced StatefulSet 每个新镜像预热时的并发度都是 `1`,也就是一个个节点拉镜像。 +如果需要调整,你可以通过 `apps.kruise.io/image-predownload-parallelism` annotation 来设置并发度。 + +另外从 Kruise v1.1.0 开始,你可以使用 `apps.kruise.io/image-predownload-min-updated-ready-pods` 来控制在少量新版本 Pod 已经升级成功之后再执行镜像预热。它的值可能是绝对值数字或是百分比。 + +```yaml +apiVersion: apps.kruise.io/v1beta1 +kind: StatefulSet +metadata: + annotations: + apps.kruise.io/image-predownload-parallelism: "10" + apps.kruise.io/image-predownload-min-updated-ready-pods: "3" +``` + +注意,为了避免大部分不必要的镜像拉取,目前只针对 replicas > 3 的 Advanced StatefulSet 做自动预热。 + +### 升级顺序 Advanced StatefulSet 在 `spec.updateStrategy.rollingUpdate` 下面新增了 `unorderedUpdate` 结构,提供给不按 order 顺序的升级策略。 如果 `unorderedUpdate` 不为空,所有 Pod 的发布顺序就不一定会按照 order 顺序了。注意,`unorderedUpdate` 只能配合 Parallel podManagementPolicy 使用。 目前,`unorderedUpdate` 下面只包含 `priorityStrategy` 一个优先级策略。 -### 优先级策略 +#### 优先级策略 这个策略定义了控制器计算 Pod 发布优先级的规则,所有需要更新的 Pod 都会通过这个优先级规则计算后排序。 目前 `priority` 可以通过 weight(权重) 和 order(序号) 两种方式来指定。 @@ -268,76 +320,50 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` -## 发布暂停 +### MaxUnavailable 最大不可用 -用户可以通过设置 paused 为 true 暂停发布,不过控制器还是会做 replicas 数量管理: +Advanced StatefulSet 在 `RollingUpdateStatefulSetStrategy` 中新增了 `maxUnavailable` 策略来支持并行 Pod 发布,它会保证发布过程中最多有多少个 Pod 处于不可用状态。注意,`maxUnavailable` 只能配合 podManagementPolicy 为 `Parallel` 来使用。 + +这个策略的效果和 `Deployment` 中的类似,但是可能会导致发布过程中的 order 顺序不能严格保证。 +如果不配置 `maxUnavailable`,它的默认值为 1,也就是和原生 `StatefulSet` 一样只能 one by one 串行发布 Pod,即使把 podManagementPolicy 配置为 `Parallel` 也是这样。 ```yaml apiVersion: apps.kruise.io/v1beta1 kind: StatefulSet spec: # ... + podManagementPolicy: Parallel updateStrategy: + type: RollingUpdate rollingUpdate: - paused: true -``` - -## 原地升级自动预热 - -**FEATURE STATE:** Kruise v0.10.0 - -如果你在[安装或升级 Kruise](../installation##optional-feature-gate) 的时候启用了 `PreDownloadImageForInPlaceUpdate` feature-gate, -Advanced StatefulSet 控制器会自动在所有旧版本 pod 所在 node 节点上预热你正在灰度发布的新版本镜像。 这对于应用发布加速很有帮助。 - -默认情况下 Advanced StatefulSet 每个新镜像预热时的并发度都是 `1`,也就是一个个节点拉镜像。 -如果需要调整,你可以通过 `apps.kruise.io/image-predownload-parallelism` annotation 来设置并发度。 - -另外从 Kruise v1.1.0 开始,你可以使用 `apps.kruise.io/image-predownload-min-updated-ready-pods` 来控制在少量新版本 Pod 已经升级成功之后再执行镜像预热。它的值可能是绝对值数字或是百分比。 - -```yaml -apiVersion: apps.kruise.io/v1beta1 -kind: StatefulSet -metadata: - annotations: - apps.kruise.io/image-predownload-parallelism: "10" - apps.kruise.io/image-predownload-min-updated-ready-pods: "3" + maxUnavailable: 20% ``` -注意,为了避免大部分不必要的镜像拉取,目前只针对 replicas > 3 的 Advanced StatefulSet 做自动预热。 +比如说,一个 Advanced StatefulSet 下面有 P0 到 P4 五个 Pod,并且应用能容忍 3 个副本不可用。 +当我们把 StatefulSet 里的 Pod 升级版本的时候,可以通过以下步骤来做: -## 流式扩容 +1. 设置 `maxUnavailable=3` +2. (可选) 如果需要灰度升级,设置 `partition=4`。Partition 默认的意思是 order 大于等于这个数值的 Pod 才会更新,在这里就只会更新 P4,即使我们设置了 `maxUnavailable=3`。 +3. 在 P4 升级完成后,把 `partition` 调整为 0。此时,控制器会同时升级 P1、P2、P3 三个 Pod。注意,如果是原生 `StatefulSet`,只能串行升级 P3、P2、P1。 +4. 一旦这三个 Pod 中有一个升级完成了,控制器会立即开始升级 P0。 -**FEATURE STATE:** Kruise v0.10.0 +### 发布暂停 -为了避免在一个新 Advanced StatefulSet 创建后有大量失败的 pod 被创建出来,从 Kruise `v0.10.0` 版本开始引入了在 scale strategy 中的 `maxUnavailable` 策略。 +用户可以通过设置 paused 为 true 暂停发布,不过控制器还是会做 replicas 数量管理: ```yaml apiVersion: apps.kruise.io/v1beta1 kind: StatefulSet spec: # ... - replicas: 100 - scaleStrategy: - maxUnavailable: 10% # percentage or absolute number + updateStrategy: + rollingUpdate: + paused: true ``` -当这个字段被设置之后,Advanced StatefulSet 会保证创建 pod 之后不可用 pod 数量不超过这个限制值。 - -比如说,上面这个 StatefulSet 一开始只会一次性创建 10 个 pod。在此之后,每当一个 pod 变为 running、ready 状态后,才会再创建一个新 pod 出来。 - -注意,这个功能只允许在 podManagementPolicy 是 `Parallel` 的 StatefulSet 中使用。 - -## PersistentVolumeClaim 保留 - -**FEATURE STATE:** Kruise v1.1.0 - -如果你在[安装或升级 Kruise](../installation##optional-feature-gate) 的时候启用了 `StatefulSetAutoDeletePVC` feature-gate, -你可以使用 `.spec.persistentVolumeClaimRetentionPolicy` 字段来控制在StatefulSet生命周期中是否以及何时删除它所创建的PVC。 - -这个功能与上游 StatefulSet (K8s >= 1.23 [alpha]) 提供的相同,可以参考[上游文档](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention)。 ## 生命周期钩子 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v0.10/user-manuals/advancedstatefulset.md b/i18n/zh/docusaurus-plugin-content-docs/version-v0.10/user-manuals/advancedstatefulset.md index c460a7ce83..87e412dfc7 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v0.10/user-manuals/advancedstatefulset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v0.10/user-manuals/advancedstatefulset.md @@ -158,7 +158,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## 发布暂停 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.0/user-manuals/advancedstatefulset.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.0/user-manuals/advancedstatefulset.md index 1bb8f8556d..1b9be20b1c 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.0/user-manuals/advancedstatefulset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.0/user-manuals/advancedstatefulset.md @@ -157,7 +157,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## 发布暂停 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.1/user-manuals/advancedstatefulset.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.1/user-manuals/advancedstatefulset.md index 46fda36c91..e90900dd86 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.1/user-manuals/advancedstatefulset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.1/user-manuals/advancedstatefulset.md @@ -161,7 +161,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## 发布暂停 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.2/user-manuals/advancedstatefulset.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.2/user-manuals/advancedstatefulset.md index ca624b7bca..049b60439b 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.2/user-manuals/advancedstatefulset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.2/user-manuals/advancedstatefulset.md @@ -161,7 +161,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## 发布暂停 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3/user-manuals/advancedstatefulset.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3/user-manuals/advancedstatefulset.md index ca624b7bca..049b60439b 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3/user-manuals/advancedstatefulset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3/user-manuals/advancedstatefulset.md @@ -161,7 +161,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## 发布暂停 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.4/user-manuals/advancedstatefulset.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.4/user-manuals/advancedstatefulset.md index ca624b7bca..049b60439b 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.4/user-manuals/advancedstatefulset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.4/user-manuals/advancedstatefulset.md @@ -161,7 +161,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## 发布暂停 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.5/installation.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.5/installation.md index 4fb53ef3b6..080e86633e 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.5/installation.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.5/installation.md @@ -16,7 +16,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Install the latest version. -$ helm install kruise openkruise/kruise --version 1.5.4 +$ helm install kruise openkruise/kruise --version 1.5.5 ``` **注意:** [Changelog](https://github.com/openkruise/kruise/blob/master/CHANGELOG.md)。 ## 通过 helm 升级 @@ -29,7 +29,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Upgrade to the latest version. -$ helm upgrade kruise openkruise/kruise --version 1.5.4 [--force] +$ helm upgrade kruise openkruise/kruise --version 1.5.5 [--force] ``` 注意: @@ -62,7 +62,7 @@ $ helm install/upgrade kruise /PATH/TO/CHART | `manager.log.level` | kruise-manager 日志输出级别 | `4` | | `manager.replicas` | kruise-manager 的期望副本数 | `2` | | `manager.image.repository` | kruise-manager/kruise-daemon 镜像仓库 | `openkruise/kruise-manager` | -| `manager.image.tag` | kruise-manager/kruise-daemon 镜像版本 | `1.5.4` | +| `manager.image.tag` | kruise-manager/kruise-daemon 镜像版本 | `1.5.5` | | `manager.resources.limits.cpu` | kruise-manager 的 limit CPU 资源 | `200m` | | `manager.resources.limits.memory` | kruise-manager 的 limit memory 资源 | `512Mi` | | `manager.resources.requests.cpu` | kruise-manager 的 request CPU 资源 | `100m` | diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.5/user-manuals/advancedstatefulset.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.5/user-manuals/advancedstatefulset.md index ca624b7bca..7a163ae59e 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.5/user-manuals/advancedstatefulset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.5/user-manuals/advancedstatefulset.md @@ -161,7 +161,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## 发布暂停 @@ -223,6 +223,27 @@ spec: - 如果要把 Pod-3 做迁移并保留序号,则把 `3` 追加到 `reserveOrdinals` 列表中。控制器会把 Pod-3 删除并创建 Pod-5(此时运行中 Pod 为 `[0,2,4,5]`)。 - 如果只想删除 Pod-3,则把 `3` 追加到 `reserveOrdinals` 列表并同时把 `replicas` 减一修改为 `3`。控制器会把 Pod-3 删除(此时运行中 Pod 为 `[0,2,4]`)。 +## 指定 Pod 删除 + +**FEATURE STATE:** Kruise v1.5.5, Kruise v1.6.4, Kruise v1.7.2+ + +相比于手动直接删除 Pod,使用 `apps.kruise.io/specified-delete: true` 指定 Pod 删除方式会有 Advanced StatefulSet 的 `maxUnavailable` 来保护删除, 并且会触发 `PreparingDelete` 生命周期 hook (见下文)。 + +```yaml +apiVersion: v1 +kind: Pod +metadata: + labels: + # ... + apps.kruise.io/specified-delete: true +spec: + containers: + - name: main + # ... +``` + +当控制器收到上面这个 Pod 更新之后,会优先处理存在指定删除标签的 pod 的删除流程,并保证不突破 `maxUnavailable` 的限制。 + ## 流式扩容 **FEATURE STATE:** Kruise v0.10.0 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.6/installation.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.6/installation.md index 8925b8113e..597e5ef3ac 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.6/installation.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.6/installation.md @@ -20,7 +20,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Install the latest version. -$ helm install kruise openkruise/kruise --version 1.6.3 +$ helm install kruise openkruise/kruise --version 1.6.4 ``` **注意:** [Changelog](https://github.com/openkruise/kruise/blob/master/CHANGELOG.md)。 ## 通过 helm 升级 @@ -33,7 +33,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Upgrade to the latest version. -$ helm upgrade kruise openkruise/kruise --version 1.6.3 [--force] +$ helm upgrade kruise openkruise/kruise --version 1.6.4 [--force] ``` 注意: @@ -66,7 +66,7 @@ $ helm install/upgrade kruise /PATH/TO/CHART | `manager.log.level` | kruise-manager 日志输出级别 | `4` | | `manager.replicas` | kruise-manager 的期望副本数 | `2` | | `manager.image.repository` | kruise-manager/kruise-daemon 镜像仓库 | `openkruise/kruise-manager` | -| `manager.image.tag` | kruise-manager/kruise-daemon 镜像版本 | `1.6.3` | +| `manager.image.tag` | kruise-manager/kruise-daemon 镜像版本 | `1.6.4` | | `manager.resources.limits.cpu` | kruise-manager 的 limit CPU 资源 | `200m` | | `manager.resources.limits.memory` | kruise-manager 的 limit memory 资源 | `512Mi` | | `manager.resources.requests.cpu` | kruise-manager 的 request CPU 资源 | `100m` | diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.6/user-manuals/advancedstatefulset.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.6/user-manuals/advancedstatefulset.md index c5d41f2f02..390b5ccc19 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.6/user-manuals/advancedstatefulset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.6/user-manuals/advancedstatefulset.md @@ -161,7 +161,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## 发布暂停 @@ -223,6 +223,27 @@ spec: - 如果要把 Pod-3 做迁移并保留序号,则把 `3` 追加到 `reserveOrdinals` 列表中。控制器会把 Pod-3 删除并创建 Pod-5(此时运行中 Pod 为 `[0,2,4,5]`)。 - 如果只想删除 Pod-3,则把 `3` 追加到 `reserveOrdinals` 列表并同时把 `replicas` 减一修改为 `3`。控制器会把 Pod-3 删除(此时运行中 Pod 为 `[0,2,4]`)。 +## 指定 Pod 删除 + +**FEATURE STATE:** Kruise v1.5.5, Kruise v1.6.4, Kruise v1.7.2+ + +相比于手动直接删除 Pod,使用 `apps.kruise.io/specified-delete: true` 指定 Pod 删除方式会有 Advanced StatefulSet 的 `maxUnavailable` 来保护删除, 并且会触发 `PreparingDelete` 生命周期 hook (见下文)。 + +```yaml +apiVersion: v1 +kind: Pod +metadata: + labels: + # ... + apps.kruise.io/specified-delete: true +spec: + containers: + - name: main + # ... +``` + +当控制器收到上面这个 Pod 更新之后,会优先处理存在指定删除标签的 pod 的删除流程,并保证不突破 `maxUnavailable` 的限制。 + ## 流式扩容 **FEATURE STATE:** Kruise v0.10.0 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.7/installation.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.7/installation.md index a19063f924..4104fefa11 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.7/installation.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.7/installation.md @@ -23,7 +23,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Install the latest version. -$ helm install kruise openkruise/kruise --version 1.7.1 +$ helm install kruise openkruise/kruise --version 1.7.2 ``` **注意:** [Changelog](https://github.com/openkruise/kruise/blob/master/CHANGELOG.md)。 ## 通过 helm 升级 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.7/user-manuals/advancedstatefulset.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.7/user-manuals/advancedstatefulset.md index e89a0d255b..c8d38b5212 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.7/user-manuals/advancedstatefulset.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.7/user-manuals/advancedstatefulset.md @@ -74,11 +74,11 @@ spec: image: nginx:alpine ``` -### User Stories +#### User Stories 起始序号能力主要是为了使 StatefulSet 更加灵活,基于该能力有状态应用可以自动化的方式在 Kubernetes 集群间迁移。如下: -#### Story 1 +##### Story 1 **Migrating across namespaces**: 许多公司使用命名空间进行隔离,考虑到用户正在将 StatefulSet 迁移到集群中的新命名空间。 迁移的原因可能是组织架构变动,也可能是要求迁出共享命名空间。如下,有 **replicas:5** 在共享命名空间中运行: @@ -102,14 +102,47 @@ ordinals.start: 0 ordinals.start: 3 [ nginx-0, nginx-1, nginx-2 ] [ nginx-3, nginx-4 ] ``` -#### Story 2 +##### Story 2 **Migrating across clusters**: 由于容量限制、基础设施限制或为了更好地隔离应用程序,采用多集群的方式可能需要在集群间移动工作负载。 -#### Story 3 +##### Story 3 **Non-Zero Based Indexing:** 用户可能希望从序号 “1 ”而不是序号 “0 ” 开始对其 StatefulSet 进行编号。使用 ”1 “ 的编号可能更容易推理和概念化(例如:序号 ”k “ 是第 ”k “ 个副本,而不是第 ”k+1 “ 个副本)。 +## 扩缩容功能 + +### PersistentVolumeClaim 保留 + +**FEATURE STATE:** Kruise v1.1.0 + +如果你在[安装或升级 Kruise](../installation##optional-feature-gate) 的时候启用了 `StatefulSetAutoDeletePVC` feature-gate, +你可以使用 `.spec.persistentVolumeClaimRetentionPolicy` 字段来控制在StatefulSet生命周期中是否以及何时删除它所创建的PVC。 + +这个功能与上游 StatefulSet (K8s >= 1.23 [alpha]) 提供的相同,可以参考[上游文档](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention)。 + +### 流式扩容 + +**FEATURE STATE:** Kruise v0.10.0 + +为了避免在一个新 Advanced StatefulSet 创建后有大量失败的 pod 被创建出来,从 Kruise `v0.10.0` 版本开始引入了在 scale strategy 中的 `maxUnavailable` 策略。 + +```yaml +apiVersion: apps.kruise.io/v1beta1 +kind: StatefulSet +spec: + # ... + replicas: 100 + scaleStrategy: + maxUnavailable: 10% # percentage or absolute number +``` + +当这个字段被设置之后,Advanced StatefulSet 会保证创建 pod 之后不可用 pod 数量不超过这个限制值。 + +比如说,上面这个 StatefulSet 一开始只会一次性创建 10 个 pod。在此之后,每当一个 pod 变为 running、ready 状态后,才会再创建一个新 pod 出来。 + +注意,这个功能只允许在 podManagementPolicy 是 `Parallel` 的 StatefulSet 中使用。 + ### 序号保留(跳过) 从 Advanced StatefulSet 的 v1beta1 版本开始(Kruise >= v0.7.0),支持序号保留功能。 @@ -132,34 +165,30 @@ spec: - 如果要把 Pod-3 做迁移并保留序号,则把 `3` 追加到 `reserveOrdinals` 列表中。控制器会把 Pod-3 删除并创建 Pod-5(此时运行中 Pod 为 `[0,2,4,5]`)。 - 如果只想删除 Pod-3,则把 `3` 追加到 `reserveOrdinals` 列表并同时把 `replicas` 减一修改为 `3`。控制器会把 Pod-3 删除(此时运行中 Pod 为 `[0,2,4]`)。 -## MaxUnavailable 最大不可用 +### 指定 Pod 删除 -Advanced StatefulSet 在 `RollingUpdateStatefulSetStrategy` 中新增了 `maxUnavailable` 策略来支持并行 Pod 发布,它会保证发布过程中最多有多少个 Pod 处于不可用状态。注意,`maxUnavailable` 只能配合 podManagementPolicy 为 `Parallel` 来使用。 +**FEATURE STATE:** Kruise v1.5.5, Kruise v1.6.4, Kruise v1.7.2+ -这个策略的效果和 `Deployment` 中的类似,但是可能会导致发布过程中的 order 顺序不能严格保证。 -如果不配置 `maxUnavailable`,它的默认值为 1,也就是和原生 `StatefulSet` 一样只能 one by one 串行发布 Pod,即使把 podManagementPolicy 配置为 `Parallel` 也是这样。 +相比于手动直接删除 Pod,使用 `apps.kruise.io/specified-delete: true` 指定 Pod 删除方式会有 Advanced StatefulSet 的 `maxUnavailable` 来保护删除, 并且会触发 `PreparingDelete` 生命周期 hook (见下文)。 ```yaml -apiVersion: apps.kruise.io/v1beta1 -kind: StatefulSet +apiVersion: v1 +kind: Pod +metadata: + labels: + # ... + apps.kruise.io/specified-delete: true spec: + containers: + - name: main # ... - podManagementPolicy: Parallel - updateStrategy: - type: RollingUpdate - rollingUpdate: - maxUnavailable: 20% ``` -比如说,一个 Advanced StatefulSet 下面有 P0 到 P4 五个 Pod,并且应用能容忍 3 个副本不可用。 -当我们把 StatefulSet 里的 Pod 升级版本的时候,可以通过以下步骤来做: +当控制器收到上面这个 Pod 更新之后,会优先处理存在指定删除标签的 pod 的删除流程,并保证不突破 `maxUnavailable` 的限制。 -1. 设置 `maxUnavailable=3` -2. (可选) 如果需要灰度升级,设置 `partition=4`。Partition 默认的意思是 order 大于等于这个数值的 Pod 才会更新,在这里就只会更新 P4,即使我们设置了 `maxUnavailable=3`。 -3. 在 P4 升级完成后,把 `partition` 调整为 0。此时,控制器会同时升级 P1、P2、P3 三个 Pod。注意,如果是原生 `StatefulSet`,只能串行升级 P3、P2、P1。 -4. 一旦这三个 Pod 中有一个升级完成了,控制器会立即开始升级 P0。 +## 升级功能 -## 原地升级 +### 原地升级 Advanced StatefulSet 增加了 `podUpdatePolicy` 来允许用户指定重建升级还是原地升级。 @@ -222,14 +251,37 @@ spec: maxUnavailable: 2 ``` -## 升级顺序 +### 原地升级自动预热 + +**FEATURE STATE:** Kruise v0.10.0 + +如果你在[安装或升级 Kruise](../installation##optional-feature-gate) 的时候启用了 `PreDownloadImageForInPlaceUpdate` feature-gate, +Advanced StatefulSet 控制器会自动在所有旧版本 pod 所在 node 节点上预热你正在灰度发布的新版本镜像。 这对于应用发布加速很有帮助。 + +默认情况下 Advanced StatefulSet 每个新镜像预热时的并发度都是 `1`,也就是一个个节点拉镜像。 +如果需要调整,你可以通过 `apps.kruise.io/image-predownload-parallelism` annotation 来设置并发度。 + +另外从 Kruise v1.1.0 开始,你可以使用 `apps.kruise.io/image-predownload-min-updated-ready-pods` 来控制在少量新版本 Pod 已经升级成功之后再执行镜像预热。它的值可能是绝对值数字或是百分比。 + +```yaml +apiVersion: apps.kruise.io/v1beta1 +kind: StatefulSet +metadata: + annotations: + apps.kruise.io/image-predownload-parallelism: "10" + apps.kruise.io/image-predownload-min-updated-ready-pods: "3" +``` + +注意,为了避免大部分不必要的镜像拉取,目前只针对 replicas > 3 的 Advanced StatefulSet 做自动预热。 + +### 升级顺序 Advanced StatefulSet 在 `spec.updateStrategy.rollingUpdate` 下面新增了 `unorderedUpdate` 结构,提供给不按 order 顺序的升级策略。 如果 `unorderedUpdate` 不为空,所有 Pod 的发布顺序就不一定会按照 order 顺序了。注意,`unorderedUpdate` 只能配合 Parallel podManagementPolicy 使用。 目前,`unorderedUpdate` 下面只包含 `priorityStrategy` 一个优先级策略。 -### 优先级策略 +#### 优先级策略 这个策略定义了控制器计算 Pod 发布优先级的规则,所有需要更新的 Pod 都会通过这个优先级规则计算后排序。 目前 `priority` 可以通过 weight(权重) 和 order(序号) 两种方式来指定。 @@ -268,76 +320,50 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` -## 发布暂停 +### MaxUnavailable 最大不可用 -用户可以通过设置 paused 为 true 暂停发布,不过控制器还是会做 replicas 数量管理: +Advanced StatefulSet 在 `RollingUpdateStatefulSetStrategy` 中新增了 `maxUnavailable` 策略来支持并行 Pod 发布,它会保证发布过程中最多有多少个 Pod 处于不可用状态。注意,`maxUnavailable` 只能配合 podManagementPolicy 为 `Parallel` 来使用。 + +这个策略的效果和 `Deployment` 中的类似,但是可能会导致发布过程中的 order 顺序不能严格保证。 +如果不配置 `maxUnavailable`,它的默认值为 1,也就是和原生 `StatefulSet` 一样只能 one by one 串行发布 Pod,即使把 podManagementPolicy 配置为 `Parallel` 也是这样。 ```yaml apiVersion: apps.kruise.io/v1beta1 kind: StatefulSet spec: # ... + podManagementPolicy: Parallel updateStrategy: + type: RollingUpdate rollingUpdate: - paused: true -``` - -## 原地升级自动预热 - -**FEATURE STATE:** Kruise v0.10.0 - -如果你在[安装或升级 Kruise](../installation##optional-feature-gate) 的时候启用了 `PreDownloadImageForInPlaceUpdate` feature-gate, -Advanced StatefulSet 控制器会自动在所有旧版本 pod 所在 node 节点上预热你正在灰度发布的新版本镜像。 这对于应用发布加速很有帮助。 - -默认情况下 Advanced StatefulSet 每个新镜像预热时的并发度都是 `1`,也就是一个个节点拉镜像。 -如果需要调整,你可以通过 `apps.kruise.io/image-predownload-parallelism` annotation 来设置并发度。 - -另外从 Kruise v1.1.0 开始,你可以使用 `apps.kruise.io/image-predownload-min-updated-ready-pods` 来控制在少量新版本 Pod 已经升级成功之后再执行镜像预热。它的值可能是绝对值数字或是百分比。 - -```yaml -apiVersion: apps.kruise.io/v1beta1 -kind: StatefulSet -metadata: - annotations: - apps.kruise.io/image-predownload-parallelism: "10" - apps.kruise.io/image-predownload-min-updated-ready-pods: "3" + maxUnavailable: 20% ``` -注意,为了避免大部分不必要的镜像拉取,目前只针对 replicas > 3 的 Advanced StatefulSet 做自动预热。 +比如说,一个 Advanced StatefulSet 下面有 P0 到 P4 五个 Pod,并且应用能容忍 3 个副本不可用。 +当我们把 StatefulSet 里的 Pod 升级版本的时候,可以通过以下步骤来做: -## 流式扩容 +1. 设置 `maxUnavailable=3` +2. (可选) 如果需要灰度升级,设置 `partition=4`。Partition 默认的意思是 order 大于等于这个数值的 Pod 才会更新,在这里就只会更新 P4,即使我们设置了 `maxUnavailable=3`。 +3. 在 P4 升级完成后,把 `partition` 调整为 0。此时,控制器会同时升级 P1、P2、P3 三个 Pod。注意,如果是原生 `StatefulSet`,只能串行升级 P3、P2、P1。 +4. 一旦这三个 Pod 中有一个升级完成了,控制器会立即开始升级 P0。 -**FEATURE STATE:** Kruise v0.10.0 +### 发布暂停 -为了避免在一个新 Advanced StatefulSet 创建后有大量失败的 pod 被创建出来,从 Kruise `v0.10.0` 版本开始引入了在 scale strategy 中的 `maxUnavailable` 策略。 +用户可以通过设置 paused 为 true 暂停发布,不过控制器还是会做 replicas 数量管理: ```yaml apiVersion: apps.kruise.io/v1beta1 kind: StatefulSet spec: # ... - replicas: 100 - scaleStrategy: - maxUnavailable: 10% # percentage or absolute number + updateStrategy: + rollingUpdate: + paused: true ``` -当这个字段被设置之后,Advanced StatefulSet 会保证创建 pod 之后不可用 pod 数量不超过这个限制值。 - -比如说,上面这个 StatefulSet 一开始只会一次性创建 10 个 pod。在此之后,每当一个 pod 变为 running、ready 状态后,才会再创建一个新 pod 出来。 - -注意,这个功能只允许在 podManagementPolicy 是 `Parallel` 的 StatefulSet 中使用。 - -## PersistentVolumeClaim 保留 - -**FEATURE STATE:** Kruise v1.1.0 - -如果你在[安装或升级 Kruise](../installation##optional-feature-gate) 的时候启用了 `StatefulSetAutoDeletePVC` feature-gate, -你可以使用 `.spec.persistentVolumeClaimRetentionPolicy` 字段来控制在StatefulSet生命周期中是否以及何时删除它所创建的PVC。 - -这个功能与上游 StatefulSet (K8s >= 1.23 [alpha]) 提供的相同,可以参考[上游文档](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention)。 ## 生命周期钩子 diff --git a/versioned_docs/version-v0.10/user-manuals/advancedstatefulset.md b/versioned_docs/version-v0.10/user-manuals/advancedstatefulset.md index 1a25a0d1d4..dadf6c8e0b 100644 --- a/versioned_docs/version-v0.10/user-manuals/advancedstatefulset.md +++ b/versioned_docs/version-v0.10/user-manuals/advancedstatefulset.md @@ -176,7 +176,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## Paused update diff --git a/versioned_docs/version-v1.0/user-manuals/advancedstatefulset.md b/versioned_docs/version-v1.0/user-manuals/advancedstatefulset.md index 57b025efd2..007e35c96f 100644 --- a/versioned_docs/version-v1.0/user-manuals/advancedstatefulset.md +++ b/versioned_docs/version-v1.0/user-manuals/advancedstatefulset.md @@ -173,7 +173,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## Paused update diff --git a/versioned_docs/version-v1.1/user-manuals/advancedstatefulset.md b/versioned_docs/version-v1.1/user-manuals/advancedstatefulset.md index 5d430f18e2..6db3f46617 100644 --- a/versioned_docs/version-v1.1/user-manuals/advancedstatefulset.md +++ b/versioned_docs/version-v1.1/user-manuals/advancedstatefulset.md @@ -177,7 +177,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## Paused update diff --git a/versioned_docs/version-v1.2/user-manuals/advancedstatefulset.md b/versioned_docs/version-v1.2/user-manuals/advancedstatefulset.md index a313e9de2c..c25dafaab4 100644 --- a/versioned_docs/version-v1.2/user-manuals/advancedstatefulset.md +++ b/versioned_docs/version-v1.2/user-manuals/advancedstatefulset.md @@ -177,7 +177,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## Paused update diff --git a/versioned_docs/version-v1.3/user-manuals/advancedstatefulset.md b/versioned_docs/version-v1.3/user-manuals/advancedstatefulset.md index a313e9de2c..c25dafaab4 100644 --- a/versioned_docs/version-v1.3/user-manuals/advancedstatefulset.md +++ b/versioned_docs/version-v1.3/user-manuals/advancedstatefulset.md @@ -177,7 +177,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## Paused update diff --git a/versioned_docs/version-v1.4/user-manuals/advancedstatefulset.md b/versioned_docs/version-v1.4/user-manuals/advancedstatefulset.md index a313e9de2c..c25dafaab4 100644 --- a/versioned_docs/version-v1.4/user-manuals/advancedstatefulset.md +++ b/versioned_docs/version-v1.4/user-manuals/advancedstatefulset.md @@ -177,7 +177,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## Paused update diff --git a/versioned_docs/version-v1.5/installation.md b/versioned_docs/version-v1.5/installation.md index f4f161c9bb..0500aa1baa 100644 --- a/versioned_docs/version-v1.5/installation.md +++ b/versioned_docs/version-v1.5/installation.md @@ -16,7 +16,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Install the latest version. -$ helm install kruise openkruise/kruise --version 1.5.4 +$ helm install kruise openkruise/kruise --version 1.5.5 ``` **Note:** [Changelog](https://github.com/openkruise/kruise/blob/master/CHANGELOG.md). @@ -30,7 +30,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Upgrade to the latest version. -$ helm upgrade kruise openkruise/kruise --version 1.5.4 [--force] +$ helm upgrade kruise openkruise/kruise --version 1.5.5 [--force] ``` Note that: @@ -68,7 +68,7 @@ The following table lists the configurable parameters of the chart and their def | `manager.log.level` | Log level that kruise-manager printed | `4` | | `manager.replicas` | Replicas of kruise-controller-manager deployment | `2` | | `manager.image.repository` | Repository for kruise-manager image | `openkruise/kruise-manager` | -| `manager.image.tag` | Tag for kruise-manager image | `v1.5.4` | +| `manager.image.tag` | Tag for kruise-manager image | `v1.5.5` | | `manager.resources.limits.cpu` | CPU resource limit of kruise-manager container | `200m` | | `manager.resources.limits.memory` | Memory resource limit of kruise-manager container | `512Mi` | | `manager.resources.requests.cpu` | CPU resource request of kruise-manager container | `100m` | diff --git a/versioned_docs/version-v1.5/user-manuals/advancedstatefulset.md b/versioned_docs/version-v1.5/user-manuals/advancedstatefulset.md index a313e9de2c..b8d7f51755 100644 --- a/versioned_docs/version-v1.5/user-manuals/advancedstatefulset.md +++ b/versioned_docs/version-v1.5/user-manuals/advancedstatefulset.md @@ -177,7 +177,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## Paused update @@ -244,6 +244,29 @@ For an Advanced StatefulSet with `replicas=4, reserveOrdinals=[1]`, the ordinals - If you just want to delete Pod-3, you should append `3` into `reserveOrdinals` list and set `replicas` to `3`. Then controller will delete Pod-3 (existing Pods will be `[0,2,4]`). +## Specified Pod Deletion + +**FEATURE STATE:** Kruise v1.5.5, Kruise v1.6.4, Kruise v1.7.2+ + +Compared to manually deleting a Pod directly, pod deletion by labeling pod with `apps.kruise.io/specified-delete: true` will be protected by the `maxUnavailable` of the Advanced StatefulSet during deletion, +and it will trigger the `PreparingDelete` lifecycle hook (see below). + +```yaml +apiVersion: v1 +kind: Pod +metadata: + labels: + # ... + apps.kruise.io/specified-delete: true +spec: + containers: + - name: main + # ... +``` + +When the controller receives the above Pod update, it will trigger the deletion process of the pod with specified deletion label and ensure that the `maxUnavailable` limit is not exceeded. +The pod will be re-built by the workload if the ordinal is not reserved. + ## Scaling with rate limiting **FEATURE STATE:** Kruise v0.10.0 diff --git a/versioned_docs/version-v1.6/installation.md b/versioned_docs/version-v1.6/installation.md index debfdc2cb1..d7be3f94dc 100644 --- a/versioned_docs/version-v1.6/installation.md +++ b/versioned_docs/version-v1.6/installation.md @@ -20,7 +20,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Install the latest version. -$ helm install kruise openkruise/kruise --version 1.6.3 +$ helm install kruise openkruise/kruise --version 1.6.4 ``` **Note:** [Changelog](https://github.com/openkruise/kruise/blob/master/CHANGELOG.md). @@ -34,7 +34,7 @@ $ helm repo add openkruise https://openkruise.github.io/charts/ $ helm repo update # Upgrade to the latest version. -$ helm upgrade kruise openkruise/kruise --version 1.6.3 [--force] +$ helm upgrade kruise openkruise/kruise --version 1.6.4 [--force] ``` Note that: @@ -72,7 +72,7 @@ The following table lists the configurable parameters of the chart and their def | `manager.log.level` | Log level that kruise-manager printed | `4` | | `manager.replicas` | Replicas of kruise-controller-manager deployment | `2` | | `manager.image.repository` | Repository for kruise-manager image | `openkruise/kruise-manager` | -| `manager.image.tag` | Tag for kruise-manager image | `v1.6.3` | +| `manager.image.tag` | Tag for kruise-manager image | `v1.6.4` | | `manager.resources.limits.cpu` | CPU resource limit of kruise-manager container | `200m` | | `manager.resources.limits.memory` | Memory resource limit of kruise-manager container | `512Mi` | | `manager.resources.requests.cpu` | CPU resource request of kruise-manager container | `100m` | diff --git a/versioned_docs/version-v1.6/user-manuals/advancedstatefulset.md b/versioned_docs/version-v1.6/user-manuals/advancedstatefulset.md index a313e9de2c..b8d7f51755 100644 --- a/versioned_docs/version-v1.6/user-manuals/advancedstatefulset.md +++ b/versioned_docs/version-v1.6/user-manuals/advancedstatefulset.md @@ -177,7 +177,7 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` ## Paused update @@ -244,6 +244,29 @@ For an Advanced StatefulSet with `replicas=4, reserveOrdinals=[1]`, the ordinals - If you just want to delete Pod-3, you should append `3` into `reserveOrdinals` list and set `replicas` to `3`. Then controller will delete Pod-3 (existing Pods will be `[0,2,4]`). +## Specified Pod Deletion + +**FEATURE STATE:** Kruise v1.5.5, Kruise v1.6.4, Kruise v1.7.2+ + +Compared to manually deleting a Pod directly, pod deletion by labeling pod with `apps.kruise.io/specified-delete: true` will be protected by the `maxUnavailable` of the Advanced StatefulSet during deletion, +and it will trigger the `PreparingDelete` lifecycle hook (see below). + +```yaml +apiVersion: v1 +kind: Pod +metadata: + labels: + # ... + apps.kruise.io/specified-delete: true +spec: + containers: + - name: main + # ... +``` + +When the controller receives the above Pod update, it will trigger the deletion process of the pod with specified deletion label and ensure that the `maxUnavailable` limit is not exceeded. +The pod will be re-built by the workload if the ordinal is not reserved. + ## Scaling with rate limiting **FEATURE STATE:** Kruise v0.10.0 diff --git a/versioned_docs/version-v1.7/installation.md b/versioned_docs/version-v1.7/installation.md index 7d5b909e19..49c4b3a496 100644 --- a/versioned_docs/version-v1.7/installation.md +++ b/versioned_docs/version-v1.7/installation.md @@ -84,7 +84,7 @@ The following table lists the configurable parameters of the chart and their def | `manager.log.level` | Log level that kruise-manager printed | `4` | | `manager.replicas` | Replicas of kruise-controller-manager deployment | `2` | | `manager.image.repository` | Repository for kruise-manager image | `openkruise/kruise-manager` | -| `manager.image.tag` | Tag for kruise-manager image | `v1.7.1` | +| `manager.image.tag` | Tag for kruise-manager image | `v1.7.2` | | `manager.resources.limits.cpu` | CPU resource limit of kruise-manager container | `200m` | | `manager.resources.limits.memory` | Memory resource limit of kruise-manager container | `512Mi` | | `manager.resources.requests.cpu` | CPU resource request of kruise-manager container | `100m` | diff --git a/versioned_docs/version-v1.7/user-manuals/advancedstatefulset.md b/versioned_docs/version-v1.7/user-manuals/advancedstatefulset.md index d6ba530237..792e33e86c 100644 --- a/versioned_docs/version-v1.7/user-manuals/advancedstatefulset.md +++ b/versioned_docs/version-v1.7/user-manuals/advancedstatefulset.md @@ -20,9 +20,9 @@ file from `apps/v1` to `apps.kruise.io/v1beta1` after installing Kruise manager. + apiVersion: apps.kruise.io/v1beta1 kind: StatefulSet metadata: - name: sample + name: sample spec: - #... + #... ``` Note that since Kruise v0.7.0, Advanced StatefulSet has been promoted to `v1beta1`, which is compatible with `v1alpha1`. @@ -51,7 +51,7 @@ metadata: Pod start ordinal numbers start at 0 by default, and you can also set the pod start ordinal number by setting the **.spec.ordinals.start** field. To use this capability, you need to enable FeatureGate **StatefulSetStartOrdinal=true**. - .spec.ordinals.start: If the .spec.ordinals.start field is set, Pods will be assigned ordinals from .spec.ordinals.start up through .spec.ordinals.start + .spec.replicas - 1. -For example: replicas=5, ordinals.start=3, Pod Range = [3, 7]. + For example: replicas=5, ordinals.start=3, Pod Range = [3, 7]. ``` apiVersion: apps.kruise.io/v1beta1 @@ -76,10 +76,10 @@ spec: image: nginx:alpine ``` -### User Stories +#### User Stories The main motivation of this feature is to support a more flexible StatefulSet, a building block in an ecosystem where Stateful applications can be migrated across Kubernetes clusters with more automation. As follows: -#### Story 1 +##### Story 1 **Migrating across namespaces**: Many organizations use namespaces for team isolation. Consider a team that is migrating a `StatefulSet` to a new namespace in a cluster. Migration could be motivated by a branding change, or a requirement to move out of a shared namespace. Consider the StatefulSet `my-app` with `replicas: 5`, running in a shared namespace. @@ -108,15 +108,48 @@ ordinals.start: 0 ordinals.start: 3 The `replicasStatefulSet` and `replicas` fields should be updated jointly, depending on the requirements of the migration. -#### Story 2 +##### Story 2 **Migrating across clusters**: Organizations taking a multi cluster approach may need to move workloads across clusters due to capacity constraints, infrastructure constraints, or for better application isolation. Similar to namespace migration, the application operator should manage network connectivity, volumes and slice orchestration. -#### Story 3 +##### Story 3 **Non-Zero Based Indexing:** A user may want to number their StatefulSet starting from ordinal `1`, rather than ordinal `0`. Using `1` based numbering may be easier to reason about and conceptualize (eg: ordinal `k` is the `k`'th replica, not the `k+1`'th replica). +## Scale features + +### PersistentVolumeClaim retention + +**FEATURE STATE:** Kruise v1.1.0 + +If you have enabled the `StatefulSetAutoDeletePVC` feature-gate during [Kruise installation or upgrade](../installation#optional-feature-gate), +you can use `.spec.persistentVolumeClaimRetentionPolicy` field to control if and how PVCs are deleted during the lifecycle of a StatefulSet. + +This is same to the upstream StatefulSet (K8s >= 1.23 [alpha]), please refer to [the upstream document for it](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention). + +### Scaling with rate limiting + +**FEATURE STATE:** Kruise v0.10.0 + +To avoid creating all failure pods at once when a new CloneSet applied, a `maxUnavailable` field for scale strategy has been added since Kruise `v0.10.0`. + +```yaml +apiVersion: apps.kruise.io/v1beta1 +kind: StatefulSet +spec: + # ... + replicas: 100 + scaleStrategy: + maxUnavailable: 10% # percentage or absolute number +``` + +When this field has been set, Advanced StatefulSet will create pods with the guarantee that the number of unavailable pods during the update cannot exceed this value. + +For example, the StatefulSet will firstly create 10 pods. After that, it will create one more pod only if one pod created has been running and ready. + +Note that it can just be allowed to work with Parallel podManagementPolicy. + ### Ordinals reserve(skip) Since Advanced StatefulSet `v1beta1` (Kruise >= v0.7.0), it supports ordinals reserve. @@ -138,45 +171,35 @@ spec: For an Advanced StatefulSet with `replicas=4, reserveOrdinals=[1]`, the ordinals of running Pods will be `[0,2,3,4]`. - If you want to migrate Pod-3 and reserve this ordinal, just append `3` into `reserveOrdinals` list. -Then controller will delete Pod-3 and create Pod-5 (existing Pods will be `[0,2,4,5]`). + Then controller will delete Pod-3 and create Pod-5 (existing Pods will be `[0,2,4,5]`). - If you just want to delete Pod-3, you should append `3` into `reserveOrdinals` list and set `replicas` to `3`. -Then controller will delete Pod-3 (existing Pods will be `[0,2,4]`). + Then controller will delete Pod-3 (existing Pods will be `[0,2,4]`). -## MaxUnavailable +### Specified Pod Deletion -Advanced StatefulSet adds a `maxUnavailable` capability in the `RollingUpdateStatefulSetStrategy` to allow parallel Pod -updates with the guarantee that the number of unavailable pods during the update cannot exceed this value. -It is only allowed to use when the podManagementPolicy is `Parallel`. +**FEATURE STATE:** Kruise v1.5.5, Kruise v1.6.4, Kruise v1.7.2+ -This feature achieves similar update efficiency like Deployment for cases where the order of -update is not critical to the workload. Without this feature, the native `StatefulSet` controller can only -update Pods one by one even if the podManagementPolicy is `Parallel`. +Compared to manually deleting a Pod directly, pod deletion by labeling pod with `apps.kruise.io/specified-delete: true` will be protected by the `maxUnavailable` of the Advanced StatefulSet during deletion, +and it will trigger the `PreparingDelete` lifecycle hook (see below). ```yaml -apiVersion: apps.kruise.io/v1beta1 -kind: StatefulSet +apiVersion: v1 +kind: Pod +metadata: + labels: + # ... + apps.kruise.io/specified-delete: true spec: + containers: + - name: main # ... - podManagementPolicy: Parallel - updateStrategy: - type: RollingUpdate - rollingUpdate: - maxUnavailable: 20% ``` -For example, assuming an Advanced StatefulSet has five Pods named P0 to P4, and the application can -tolerate losing three replicas temporally. If we want to update the StatefulSet Pod spec from v1 to -v2, we can perform the following steps using the `MaxUnavailable` feature for fast update. - -1. Set `MaxUnavailable` to 3 to allow three unavailable Pods maximally. -2. Optionally, Set `Partition` to 4 in case canary update is needed. Partition means all Pods with an ordinal that is - greater than or equal to the partition will be updated. In this case P4 will be updated even though `MaxUnavailable` - is 3. -3. After P4 finish update, change `Partition` to 0. The controller will update P1,P2 and P3 concurrently. - Note that with default StatefulSet, the Pods will be updated sequentially in the order of P3, P2, P1. -4. Once one of P1, P2 and P3 finishes update, P0 will be updated immediately. +When the controller receives the above Pod update, it will trigger the deletion process of the pod with specified deletion label and ensure that the `maxUnavailable` limit is not exceeded. +The pod will be re-built by the workload if the ordinal is not reserved. -## In-Place Update +## Update features +### In-Place Update Advanced StatefulSet adds a `podUpdatePolicy` field in `spec.updateStrategy.rollingUpdate` which controls recreate or in-place update for Pods. @@ -244,14 +267,38 @@ spec: maxUnavailable: 2 ``` -## Update sequence +### Pre-download image for in-place update + +**FEATURE STATE:** Kruise v0.10.0 + +If you have enabled the `PreDownloadImageForInPlaceUpdate` feature-gate during [Kruise installation or upgrade](../installation#optional-feature-gate), +Advanced StatefulSet controller will automatically pre-download the image you want to update to the nodes of all old Pods. +It is quite useful to accelerate the progress of applications upgrade. + +The parallelism of each new image pre-downloading by Advanced StatefulSet is `1`, which means the image is downloaded on nodes one by one. +You can change the parallelism using `apps.kruise.io/image-predownload-parallelism` annotation on Advanced StatefulSet according to the capability of image registry, +for registries with more bandwidth and P2P image downloading ability, a larger parallelism can speed up the pre-download process. + +Since Kruise v1.1.0, you can use `apps.kruise.io/image-predownload-min-updated-ready-pods` to make sure the new image starting pre-download after a few new Pods have been updated ready. Its value can be absolute number or percentage. + +```yaml +apiVersion: apps.kruise.io/v1beta1 +kind: StatefulSet +metadata: + annotations: + apps.kruise.io/image-predownload-parallelism: "10" + apps.kruise.io/image-predownload-min-updated-ready-pods: "3" +``` + +Note that to avoid most unnecessary image downloading, now controller will only pre-download images for Advanced StatefulSet with replicas > `3`. + +### Update sequence Advanced StatefulSet adds a `unorderedUpdate` field in `spec.updateStrategy.rollingUpdate`, which contains strategies for non-ordered update. If `unorderedUpdate` is not nil, pods will be updated with non-ordered sequence. Noted that UnorderedUpdate can only be allowed to work with Parallel podManagementPolicy. Currently `unorderedUpdate` only contains one field: `priorityStrategy`. - -### Priority strategy +#### Priority strategy This strategy defines rules for calculating the priority of updating pods. All update candidates will be applied with the priority terms. @@ -291,79 +338,57 @@ spec: unorderedUpdate: priorityStrategy: orderPriority: - - orderedKey: some-label-key + - orderedKey: some-label-key ``` -## Paused update +### MaxUnavailable -`paused` indicates that Pods updating is paused, controller will not update Pods but just maintain the number of replicas. +Advanced StatefulSet adds a `maxUnavailable` capability in the `RollingUpdateStatefulSetStrategy` to allow parallel Pod +updates with the guarantee that the number of unavailable pods during the update cannot exceed this value. +It is only allowed to use when the podManagementPolicy is `Parallel`. + +This feature achieves similar update efficiency like Deployment for cases where the order of +update is not critical to the workload. Without this feature, the native `StatefulSet` controller can only +update Pods one by one even if the podManagementPolicy is `Parallel`. ```yaml apiVersion: apps.kruise.io/v1beta1 kind: StatefulSet spec: # ... + podManagementPolicy: Parallel updateStrategy: + type: RollingUpdate rollingUpdate: - paused: true -``` - -## Pre-download image for in-place update - -**FEATURE STATE:** Kruise v0.10.0 - -If you have enabled the `PreDownloadImageForInPlaceUpdate` feature-gate during [Kruise installation or upgrade](../installation#optional-feature-gate), -Advanced StatefulSet controller will automatically pre-download the image you want to update to the nodes of all old Pods. -It is quite useful to accelerate the progress of applications upgrade. - -The parallelism of each new image pre-downloading by Advanced StatefulSet is `1`, which means the image is downloaded on nodes one by one. -You can change the parallelism using `apps.kruise.io/image-predownload-parallelism` annotation on Advanced StatefulSet according to the capability of image registry, -for registries with more bandwidth and P2P image downloading ability, a larger parallelism can speed up the pre-download process. - -Since Kruise v1.1.0, you can use `apps.kruise.io/image-predownload-min-updated-ready-pods` to make sure the new image starting pre-download after a few new Pods have been updated ready. Its value can be absolute number or percentage. - -```yaml -apiVersion: apps.kruise.io/v1beta1 -kind: StatefulSet -metadata: - annotations: - apps.kruise.io/image-predownload-parallelism: "10" - apps.kruise.io/image-predownload-min-updated-ready-pods: "3" + maxUnavailable: 20% ``` -Note that to avoid most unnecessary image downloading, now controller will only pre-download images for Advanced StatefulSet with replicas > `3`. +For example, assuming an Advanced StatefulSet has five Pods named P0 to P4, and the application can +tolerate losing three replicas temporally. If we want to update the StatefulSet Pod spec from v1 to +v2, we can perform the following steps using the `MaxUnavailable` feature for fast update. -## Scaling with rate limiting +1. Set `MaxUnavailable` to 3 to allow three unavailable Pods maximally. +2. Optionally, Set `Partition` to 4 in case canary update is needed. Partition means all Pods with an ordinal that is + greater than or equal to the partition will be updated. In this case P4 will be updated even though `MaxUnavailable` + is 3. +3. After P4 finish update, change `Partition` to 0. The controller will update P1,P2 and P3 concurrently. + Note that with default StatefulSet, the Pods will be updated sequentially in the order of P3, P2, P1. +4. Once one of P1, P2 and P3 finishes update, P0 will be updated immediately. -**FEATURE STATE:** Kruise v0.10.0 +### Paused update -To avoid creating all failure pods at once when a new CloneSet applied, a `maxUnavailable` field for scale strategy has been added since Kruise `v0.10.0`. +`paused` indicates that Pods updating is paused, controller will not update Pods but just maintain the number of replicas. ```yaml apiVersion: apps.kruise.io/v1beta1 kind: StatefulSet spec: # ... - replicas: 100 - scaleStrategy: - maxUnavailable: 10% # percentage or absolute number + updateStrategy: + rollingUpdate: + paused: true ``` -When this field has been set, Advanced StatefulSet will create pods with the guarantee that the number of unavailable pods during the update cannot exceed this value. - -For example, the StatefulSet will firstly create 10 pods. After that, it will create one more pod only if one pod created has been running and ready. - -Note that it can just be allowed to work with Parallel podManagementPolicy. - -## PersistentVolumeClaim retention - -**FEATURE STATE:** Kruise v1.1.0 - -If you have enabled the `StatefulSetAutoDeletePVC` feature-gate during [Kruise installation or upgrade](../installation#optional-feature-gate), -you can use `.spec.persistentVolumeClaimRetentionPolicy` field to control if and how PVCs are deleted during the lifecycle of a StatefulSet. - -This is same to the upstream StatefulSet (K8s >= 1.23 [alpha]), please refer to [the upstream document for it](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-retention). - ## Lifecycle hook **FEATURE STATE:** Kruise v0.8.0