Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase ResourceMemory limit #222

Conversation

lpiwowar
Copy link
Collaborator

@lpiwowar lpiwowar commented Oct 8, 2024

We are hitting an issue when pods are being killed in jobs because of OOM error. It seems like the ResourceMemory limit set with this PR [1] is not enough for the test operator pods.

This patch increases the ResourceMemory limit to 8Gi. We should revisit this problem later and investigate whether the limit can not be lower and what exactly caused the OOM errors.

[1] #205

We are hitting an issue when pods are being killed in jobs because of
OOM error. It seems like the ResourceMemory limit set with this PR [1]
is not enough for the test operator pods.

This patch increases the ResourceMemory limit to 8Gi. We should revisit
this problem later and investigate whether the limit can not be lower and
what exactly caused the OOM errors.

[1] openstack-k8s-operators#205
@openshift-ci openshift-ci bot added the approved label Oct 8, 2024
@lpiwowar lpiwowar requested review from eduolivares and removed request for stuggi October 8, 2024 14:38
Copy link

openshift-ci bot commented Oct 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eduolivares, lpiwowar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karelyatin
Copy link
Contributor

This is causing issues in CI openstack-k8s-operators/edpm-ansible#777 , this also increased Request along with limit making pod not to schedule at all.

lpiwowar added a commit to lpiwowar/test-operator that referenced this pull request Oct 9, 2024
This patch removes the resource limit for test operator pods as we
are hitting both issue when:

- the limit is too low
- the limit is too high (scheduler does not have enough resources)

This is a hotfix before we expose the resource limit values through
the test-operator CRs and find the correct default values.

Related PRs:
- openstack-k8s-operators#222
- openstack-k8s-operators#205
@lpiwowar lpiwowar mentioned this pull request Oct 9, 2024
lpiwowar added a commit to lpiwowar/test-operator that referenced this pull request Oct 9, 2024
This patch removes the resource limit for test operator pods as we
are hitting both issue when:

- the limit is too low
- the limit is too high (scheduler does not have where to schedule the
  pod)

This is a hotfix before we expose the resource limit values through
the test-operator CRs and find the correct default values.

Related PRs:
- openstack-k8s-operators#222
- openstack-k8s-operators#205
lpiwowar added a commit to lpiwowar/test-operator that referenced this pull request Oct 9, 2024
This patch removes the resource limit for test operator pods as we
are hitting both issue when:

- the limit is too low
- the limit is too high (scheduler does not have where to schedule the
  pod)

This is a hotfix before we expose the resource limit values through
the test-operator CRs and find the correct default values.

Related PRs:
- openstack-k8s-operators#222
- openstack-k8s-operators#205
lpiwowar added a commit to lpiwowar/test-operator that referenced this pull request Oct 30, 2024
This patch introduces back limits for the pods spawned by the
test-operator after they were increased and later removed with these
two PRs [1][2].

The problem with the previous two patches was that they only set the
Resources.Limits field and not the Resources.Requests field. When
Resources.Limits is set and Resources.Requests is empty then it
inherrits the value from Resources.Limits.

Therefore, we first hit the OOM killed issue when we set the
Resources.Limits too low and later when we increased the value we hit
the "Insufficient memory" error (due to high value in
Resources.Requests field)

This patch addresses the above mentioned issue by:
  - setting sane default values for Resource.Limits
  - setting sane default values for Resource.Requests and
  - introduces new parameter called .Spec.Resources which can be used
    to change the default values.

[1] openstack-k8s-operators#222
[2] openstack-k8s-operators#224
@lpiwowar lpiwowar mentioned this pull request Oct 30, 2024
lpiwowar added a commit to lpiwowar/test-operator that referenced this pull request Oct 30, 2024
This patch introduces back limits for the pods spawned by the
test-operator after they were increased and later removed with these
two PRs [1][2].

The problem with the previous two patches was that they only set the
Resources.Limits field and not the Resources.Requests field. When
Resources.Limits is set and Resources.Requests is empty then it
inherrits the value from Resources.Limits.

Therefore, we first hit the OOM killed issue when we set the
Resources.Limits too low and later when we increased the value we hit
the "Insufficient memory" error (due to high value in
Resources.Requests field)

This patch addresses the above mentioned issue by:
  - setting sane default values for Resource.Limits
  - setting sane default values for Resource.Requests and
  - introduces new parameter called .Spec.Resources which can be used
    to change the default values.

[1] openstack-k8s-operators#222
[2] openstack-k8s-operators#224
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants