Increase ResourceMemory limit #222

lpiwowar · 2024-10-08T14:37:57Z

We are hitting an issue when pods are being killed in jobs because of OOM error. It seems like the ResourceMemory limit set with this PR [1] is not enough for the test operator pods.

This patch increases the ResourceMemory limit to 8Gi. We should revisit this problem later and investigate whether the limit can not be lower and what exactly caused the OOM errors.

[1] #205

We are hitting an issue when pods are being killed in jobs because of OOM error. It seems like the ResourceMemory limit set with this PR [1] is not enough for the test operator pods. This patch increases the ResourceMemory limit to 8Gi. We should revisit this problem later and investigate whether the limit can not be lower and what exactly caused the OOM errors. [1] openstack-k8s-operators#205

openshift-ci · 2024-10-08T14:43:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eduolivares, lpiwowar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [lpiwowar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

karelyatin · 2024-10-09T14:03:35Z

This is causing issues in CI openstack-k8s-operators/edpm-ansible#777 , this also increased Request along with limit making pod not to schedule at all.

This patch removes the resource limit for test operator pods as we are hitting both issue when: - the limit is too low - the limit is too high (scheduler does not have enough resources) This is a hotfix before we expose the resource limit values through the test-operator CRs and find the correct default values. Related PRs: - openstack-k8s-operators#222 - openstack-k8s-operators#205

This patch removes the resource limit for test operator pods as we are hitting both issue when: - the limit is too low - the limit is too high (scheduler does not have where to schedule the pod) This is a hotfix before we expose the resource limit values through the test-operator CRs and find the correct default values. Related PRs: - openstack-k8s-operators#222 - openstack-k8s-operators#205

This patch introduces back limits for the pods spawned by the test-operator after they were increased and later removed with these two PRs [1][2]. The problem with the previous two patches was that they only set the Resources.Limits field and not the Resources.Requests field. When Resources.Limits is set and Resources.Requests is empty then it inherrits the value from Resources.Limits. Therefore, we first hit the OOM killed issue when we set the Resources.Limits too low and later when we increased the value we hit the "Insufficient memory" error (due to high value in Resources.Requests field) This patch addresses the above mentioned issue by: - setting sane default values for Resource.Limits - setting sane default values for Resource.Requests and - introduces new parameter called .Spec.Resources which can be used to change the default values. [1] openstack-k8s-operators#222 [2] openstack-k8s-operators#224

openshift-ci bot requested review from kopecmartin and stuggi October 8, 2024 14:38

openshift-ci bot added the approved label Oct 8, 2024

lpiwowar requested review from eduolivares and removed request for stuggi October 8, 2024 14:38

eduolivares approved these changes Oct 8, 2024

View reviewed changes

openshift-ci bot assigned eduolivares Oct 8, 2024

openshift-ci bot added the lgtm label Oct 8, 2024

lpiwowar mentioned this pull request Oct 8, 2024

[18.0.0-proposed] Enable running the test-operator role multiple times openstack-k8s-operators/ci-framework#2426

Merged

openshift-merge-bot bot merged commit edd79b3 into openstack-k8s-operators:main Oct 8, 2024
8 checks passed

lpiwowar deleted the hotfix/memory-limit branch October 9, 2024 09:16

karelyatin mentioned this pull request Oct 9, 2024

Revert "Update telemetry role to deploy Kepler" openstack-k8s-operators/edpm-ansible#777

Merged

lpiwowar mentioned this pull request Oct 9, 2024

Remove resource limit #224

Merged

lpiwowar mentioned this pull request Oct 30, 2024

Limit pod resources #237

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase ResourceMemory limit #222

Increase ResourceMemory limit #222

lpiwowar commented Oct 8, 2024

openshift-ci bot commented Oct 8, 2024

karelyatin commented Oct 9, 2024

Increase ResourceMemory limit #222

Increase ResourceMemory limit #222

Conversation

lpiwowar commented Oct 8, 2024

openshift-ci bot commented Oct 8, 2024

karelyatin commented Oct 9, 2024