Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to Set Thread Num by Cgroups when Users Did Not Do So #188

Closed
wants to merge 1 commit into from

Conversation

PwzXxm
Copy link
Collaborator

@PwzXxm PwzXxm commented Nov 9, 2023

issue: #187
/kind enhancement

Read system cgroups to find out the CPU quota and period to get the correct limited CPU num under the K8s environment when running Knowhere alone, i.e., without Milvus.

For instance, if an instance has 64 CPUs, k8s limits it to 16c,

  • Before the change, Knowhere initializes the thread pool to 64 and uses 25% of each CPU.
  • After the change, Knowhere initializes the thread pool to 16 and uses 100% of each CPU.

If it fails to read from groups, fallback hardware_concurrency().
For systems other than Linux, fallback as well.

Support cgroups V1 only for now. Maybe support V2 later.

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: PwzXxm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify mergify bot added the dco-passed label Nov 9, 2023
@PwzXxm PwzXxm changed the title Set Thread Num According to Cgroup Set Thread Num According to Cgroups Nov 9, 2023
@xiaofan-luan
Copy link

The easiest way is simply passed in from golang?

@xiaofan-luan
Copy link

as a library, I think it make sense to receive a params of concurrency?
If user don't pass in then we fallback to current logic

@PwzXxm
Copy link
Collaborator Author

PwzXxm commented Nov 9, 2023

The easiest way is simply passed in from golang?

Yes, if you run Milvus + Knowhere, Milvus will use a hook to pass it to Knowhere. This PR fixs when Knowhere is running alone, e.g., VectorDBBench now supports benchmark Knowhere/Cardinal alone and we encountered this issue.

@PwzXxm
Copy link
Collaborator Author

PwzXxm commented Nov 9, 2023

as a library, I think it make sense to receive a params of concurrency? If user don't pass in then we fallback to current logic

Right. This PR will not change this logic. Use the Milvus passed params first; if not received anything, fallback to read from cgroups; fails to read from cgroups, fallback to system thread num.

Copy link
Collaborator

@zhengbuqian zhengbuqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall but need more comments to explain how it works so the code reader don't need to google.

include/knowhere/comp/thread_pool.h Outdated Show resolved Hide resolved
include/knowhere/comp/thread_pool.h Outdated Show resolved Hide resolved
include/knowhere/comp/thread_pool.h Outdated Show resolved Hide resolved
@mergify mergify bot added the ci-passed label Nov 9, 2023
@liliu-z
Copy link
Collaborator

liliu-z commented Nov 9, 2023

Why we cannot simply call the init method to set the number from lib caller side?

@PwzXxm
Copy link
Collaborator Author

PwzXxm commented Nov 9, 2023

Why we cannot simply call the init method to set the number from lib caller side?

This requires an additional pyknowhere interface and a manual setup every time. Then the question is, should Knowhere use ALL of the CPUs BY DEFAULT even if it was limited to a smaller number of them? Or should Knowhere care about the environment at all when the user didn't set any value?

@PwzXxm PwzXxm changed the title Set Thread Num According to Cgroups Fallback to Set Thread Num by Cgroups when Users Did Not Do So Nov 9, 2023
@mergify mergify bot removed the ci-passed label Nov 9, 2023
@alexanderguzhva
Copy link
Collaborator

Just to confirm. What is going to happen if I run under taskset or numactl?

@zhengbuqian
Copy link
Collaborator

Just to confirm. What is going to happen if I run under taskset or numactl?

taskset or numactl doesn't rely on cgroups on controlling the cpu usage of processes, they do that by setting the process's CPU affinity mask using sched_setaffinity syscall. So this won't detect the limitation set by taskset or numactl.

@PwzXxm
Copy link
Collaborator Author

PwzXxm commented Nov 10, 2023

Just to confirm. What is going to happen if I run under taskset or numactl?

Good question. I just tested both.

image image

Buqian is correct that they do not interact with each other. So for this PR, you can view it as an additional shield protection under k8s env. If you run taskset or numactl and bind to a single CPU, under an 8c K8s pod, on an instance with 64 CPUs:

  • Before the change, you run the process on a single CPU with 64 threads.
  • After this change, you run the process on a single CPU with 8 threads.

If it is not under K8s env, no limit on cgroups, and there is no change after this PR, it will create 64 threads.

In this case, it might be better to call the setters or we might create a separate PR for this.
This is an enchantment for the Knowhere default behavior and the initiative is to be less error-prone on testing.

@PwzXxm PwzXxm closed this Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants