Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to use cgroup v2 prior to cgroup v1 in path resolver #1978

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions src/shared/metadata/cgroup_path_resolver.cc
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,21 @@
#include "src/shared/metadata/cgroup_path_resolver.h"

DEFINE_bool(force_cgroup2_mode, true, "Flag to force assume cgroup2 fs for testing purposes");
DEFINE_bool(force_cgroup1_mode, false, "Flag to force use of cgroup v1 fs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to do this more, but using flags for testing is generally error prone, because they persist across tests. I'd recommend finding a way to avoid flags like this. This goes for the force_cgroup2_mode as well, if we're going to try to fix this.

Copy link
Member Author

@ddelnano ddelnano Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this was if it's worth extensively testing the new logic ordering (v2 before v1), then this flag would be used to restore the old behavior in the event there was a distro/system that had issues. Since the tests expected that the check would happen in the previous ordering, it also needed to be used there as well.


namespace px {
namespace md {

StatusOr<std::string> CGroupBasePath(std::string_view sysfs_path) {
std::string cgv2_base_path = absl::StrCat(sysfs_path, "/cgroup");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to test extensively if we change this. Systems can have both cgroup and cgroup2 if I recall correctly, and this change has to potential to break things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the OVH case is one where cgroup v1 and v2 are both present, but our logic fails for it.

I agree that this needs to be tested very carefully. After thinking about this more, I think it will be difficult to be confident that all edge cases are covered even if time is invested in testing the known cases.

I'm going to think about this more and see if there is a safer option for handling the OVH case. If the cgroup v1 code doesn't short circuit and allows reaching the cgroup v2 logic, then testing this reordering can be avoided.

struct statfs info;
auto fs_status = statfs(cgv2_base_path.c_str(), &info);
bool cgroupv2 = (fs_status == 0) && (info.f_type == CGROUP2_SUPER_MAGIC);

if (cgroupv2 && !FLAGS_force_cgroup1_mode) {
return cgv2_base_path;
}

// Different hosts may mount different cgroup dirs. Try a couple for robustness.
const std::vector<std::string> cgroup_dirs = {"cpu,cpuacct", "cpu", "pids"};

Expand All @@ -47,15 +57,11 @@ StatusOr<std::string> CGroupBasePath(std::string_view sysfs_path) {
}
}

std::string cgv2_base_path = absl::StrCat(sysfs_path, "/cgroup");
struct statfs info;
auto fs_status = statfs(cgv2_base_path.c_str(), &info);
bool cgroupv2 = (fs_status == 0) && (info.f_type == CGROUP2_SUPER_MAGIC);

// TODO(ddelnano): this aids in testing the LegacyCGroupPathResolver. Determine what
// to do with this.
Comment on lines +60 to +61
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default value of FLAGS_force_cgroup2_mode be false? I would think that makes more sense. Maybe that helps with cleaning up the tests.

But I also think using flags in tests is error prone, as mentioned above, so we should try to get away from that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about the possibility of FLAGS_force_cgroup2_mode's default true value meant this code path could be used in production clusters. In theory, the caller should not parse a cgv2_base_path correctly (since the filesystem check didn't indicate cgroup v2). However, since both v1 and v2 can be active and this OVH issue shows that these cases are complicated, I'm a little worried about removing this.

if (cgroupv2 || FLAGS_force_cgroup2_mode) {
return cgv2_base_path;
}
Comment on lines +60 to 64
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that this code branch needs to stay to support the LeagcyCGroupPathResolverTest.Cgroup2Format test. Was the intention to remove the legacy resolver at some point? Is that something that could be considered now?

// (TODO): This check for cgroup2FS is eventually to be moved above the cgroupv1 check.

return error::NotFound("Could not find CGroup base path");
}
Expand Down
1 change: 1 addition & 0 deletions src/shared/metadata/cgroup_path_resolver.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
#include "src/shared/metadata/k8s_objects.h"

DECLARE_bool(force_cgroup2_mode);
DECLARE_bool(force_cgroup1_mode);

namespace px {
namespace md {
Expand Down
10 changes: 10 additions & 0 deletions src/shared/metadata/cgroup_path_resolver_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ constexpr std::string_view kContainerID =
"a7638fe3934b37419cc56bca73465a02b354ba6e98e10272542d84eb2014dd62";

TEST(CGroupPathResolver, GKEFormat) {
FLAGS_force_cgroup1_mode = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we must set flags in test, then use PX_SET_FOR_SCOPE to make sure it doesn't leak out.

std::string cgroup_kubepod_path =
"/sys/fs/cgroup/cpu,cpuacct/kubepods/pod8dbc5577-d0e2-4706-8787-57d52c03ddf2/"
"14011c7d92a9e513dfd69211da0413dbf319a5e45a02b354ba6e98e10272542d/cgroup.procs";
Expand All @@ -56,6 +57,7 @@ TEST(CGroupPathResolver, GKEFormat) {
}

TEST(CGroupPathResolver, GKEFormat2) {
FLAGS_force_cgroup1_mode = true;
std::string cgroup_kubepod_path =
"/sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/podc458de04-9784-4f7a-990e-cefe26b511f0/"
"01aa0bfe91e8a58da5f1f4db469fa999fe9263c702111e611445cde2b9cb0c1a/cgroup.procs";
Expand All @@ -80,6 +82,7 @@ TEST(CGroupPathResolver, GKEFormat2) {
}

TEST(CGroupPathResolver, StandardFormatDocker) {
FLAGS_force_cgroup1_mode = true;
std::string cgroup_kubepod_path =
"/sys/fs/cgroup/cpu,cpuacct/kubepods.slice/"
"kubepods-pod8dbc5577_d0e2_4706_8787_57d52c03ddf2.slice/"
Expand Down Expand Up @@ -111,6 +114,7 @@ TEST(CGroupPathResolver, StandardFormatDocker) {
}

TEST(CGroupPathResolver, StandardFormatCRIO) {
FLAGS_force_cgroup1_mode = true;
std::string cgroup_kubepod_path =
"/sys/fs/cgroup/cpu,cpuacct/kubepods.slice/"
"kubepods-pod8dbc5577_d0e2_4706_8787_57d52c03ddf2.slice/"
Expand Down Expand Up @@ -142,6 +146,7 @@ TEST(CGroupPathResolver, StandardFormatCRIO) {
}

TEST(CGroupPathResolver, OpenShiftFormat) {
FLAGS_force_cgroup1_mode = true;
std::string cgroup_kubepod_path =
"/sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/"
"kubepods-burstable-pod9b7969b2_aad0_47d4_b11c_4acfd1ce018e.slice/"
Expand Down Expand Up @@ -181,6 +186,7 @@ TEST(CGroupPathResolver, OpenShiftFormat) {
}

TEST(CGroupPathResolver, BareMetalK8s_1_21) {
FLAGS_force_cgroup1_mode = true;
std::string cgroup_kubepod_path =
"/sys/fs/cgroup/cpu,cpuacct/system.slice/containerd.service/"
"kubepods-besteffort-pod1544eb37_e4f7_49eb_8cc4_3d01c41be77b.slice:cri-containerd:"
Expand Down Expand Up @@ -224,6 +230,7 @@ std::string GetSysFsPathFromTestDataFile(const std::string& fname,
} // namespace

TEST(LegacyCGroupPathResolverTest, GKEFormat) {
FLAGS_force_cgroup1_mode = true;
ASSERT_OK_AND_ASSIGN(
auto path_resolver,
LegacyCGroupPathResolver::Create(GetSysFsPathFromTestDataFile(
Expand All @@ -246,6 +253,7 @@ TEST(LegacyCGroupPathResolverTest, GKEFormat) {
}

TEST(LegacyCGroupPathResolverTest, StandardFormat) {
FLAGS_force_cgroup1_mode = true;
ASSERT_OK_AND_ASSIGN(
auto path_resolver,
LegacyCGroupPathResolver::Create(GetSysFsPathFromTestDataFile(
Expand Down Expand Up @@ -317,6 +325,7 @@ TEST(LegacyCGroupPathResolverTest, StandardFormat) {
}

TEST(LeagcyCGroupPathResolverTest, Cgroup2Format) {
FLAGS_force_cgroup1_mode = false;
ASSERT_OK_AND_ASSIGN(
auto path_resolver,
LegacyCGroupPathResolver::Create(GetSysFsPathFromTestDataFile(
Expand Down Expand Up @@ -349,6 +358,7 @@ TEST(LeagcyCGroupPathResolverTest, Cgroup2Format) {
}

TEST(CGroupPathResolver, Cgroup2Format) {
FLAGS_force_cgroup1_mode = false;
std::string cgroup_kubepod_path =
"/sys/fs/cgroup/kubepods.slice/"
"kubepods-pod8dbc5577_d0e2_4706_8787_57d52c03ddf2.slice/"
Expand Down
Loading