Doesn't work on ECS (nested cgroups) #66

blampe · 2023-04-29T14:38:28Z

Unlike Kubernetes, ECS only allows you to apply a CPU quota at the task (pod) level. Containers in the task are always unbounded.

For example, when cpu: 1024 (1 vCPU) is provided in the task definition it gets the expected quota:

cat /sys/fs/cgroup/cpu,cpuacct/ecs/1576650513ed4c5d9328a6d67a8a741b/cpu.cfs_quota_us
100000

But providing cpu: 1024 to a container inside the same task doesn't have the same effect:

cat /sys/fs/cgroup/cpu,cpuacct/ecs/1576650513ed4c5d9328a6d67a8a741b/2030454c61d157d4c38f0606fe99667bca8961ce4e0019d3667f67e625f40c12/cpu.cfs_quota_us
-1

(The container's cpu value is only used for placement and CPU shares, but doesn't actually affect CPU scheduling aws/containers-roadmap#1862.)

If the container is using automaxprocs it only sees a quota of -1 and defaults to using all of runtime.NumCPU, even though the task's cgroup clamps it to 1 vCPU.

(I'm using cgroups v1 as an example here but the same is true with v2 as well, if you happen to be using an AL2023 AMI.)

It seems like the library could climb up the mount point to find quotas belonging to parents, but this is suboptimal if the task has more than one container.

I'm mostly writing this down to help anyone else avoid this rabbit hole.

The text was updated successfully, but these errors were encountered:

richardartoul · 2023-07-02T22:27:43Z

Thank you for saving me the time lol

rdforte · 2024-09-20T22:36:25Z

For anyone else who stumbles across this like I did. I also experienced the issue of trying to auto set GOMAXPROCS using automaxprocs in ECS. Unfortunately that won't work because just like @blampe mentioned the containers cpu.cfs_quota_us is set to -1.

I did however manage to find a workaround. It aint pretty but what you can do as part of your app startup is leverage the ECS Metadata endpoint which you can reference as part of an env variable to pull the container and task cpu limit. You can then use the cpu limit to set GOMAXPROCS.

I've put together a repo of an example for anyone who is interested: rdforte/gomaxecs/

recrack · 2025-01-06T04:22:24Z

is it solved?

rdforte · 2025-01-09T07:50:41Z

@recrack you can try gomaxecs if you are looking at resolving the GOMAXPROCS issue in ECS. You can see how Otel has implemented it here along with uber's automaxprocs if you wanted a solution for k8s and ECS otherwise the Quick Start for gomaxecs will suffice.

Apparently, the cgroups are not correctly set on ECS and the CFS quota cannot be determined by automaxprocs. See uber-go/automaxprocs#66. Signed-off-by: Charith Ellawala <[email protected]>

Apparently, the cgroups are not correctly set on ECS and the CFS quota cannot be determined by automaxprocs. See uber-go/automaxprocs#66. Signed-off-by: Charith Ellawala <[email protected]> Signed-off-by: Charith Ellawala <[email protected]>

r0mdau mentioned this issue Dec 12, 2024

[extension/cgroupruntime] Be aware of ECS task and CPU limits open-telemetry/opentelemetry-collector-contrib#36814

Closed

charithe mentioned this issue Jan 28, 2025

enhancement: Correctly set GOMAXPROCS on ECS cerbos/cerbos#2459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't work on ECS (nested cgroups) #66

Doesn't work on ECS (nested cgroups) #66

blampe commented Apr 29, 2023 •

edited

Loading

richardartoul commented Jul 2, 2023

rdforte commented Sep 20, 2024

recrack commented Jan 6, 2025

rdforte commented Jan 9, 2025 •

edited

Loading

Doesn't work on ECS (nested cgroups) #66

Doesn't work on ECS (nested cgroups) #66

Comments

blampe commented Apr 29, 2023 • edited Loading

richardartoul commented Jul 2, 2023

rdforte commented Sep 20, 2024

recrack commented Jan 6, 2025

rdforte commented Jan 9, 2025 • edited Loading

blampe commented Apr 29, 2023 •

edited

Loading

rdforte commented Jan 9, 2025 •

edited

Loading