Adding containerd compatability to oom_logger - COMPINFRA-3947 #3904

EmanElsaban · 2024-06-27T21:25:28Z

Adds containerd support to oom_logger in paasta_tools.

[Testing]

Echoed some log line to match the regex to test oom_regex_kubernetes and oom_regex_kubernetes_structured. We see the log in the stream. This verifies that the Containerd Get request is doing the right thing and that we are getting the passed environment variables https://fluffy.yelpcorp.com/i/CMPWkJ7Fczg6ZTKwQJg8BZBqwTF4BkM9.html
looked at the error messages produced from spinning a cits instance in kubestage on containerd pool then exec in the container python and triggered an oom. The error messages: https://fluffy.yelpcorp.com/i/D8HgrgvMj9Zmt4h9qXmv5HzJVmkWdFSn.html added a regex for the message

Jul  4 14:03:23 10-40-11-245-uswest1adevc kernel: : [ 7195.442928] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-e216d2f1e6c625d363c71edb6b3cbab5a9e1b447641b61028d0b94b077adf27c.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod08768c36_163c_40e5_8e49_09cf42ff5046.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod08768c36_163c_40e5_8e49_09cf42ff5046.slice/cri-containerd-e216d2f1e6c625d363c71edb6b3cbab5a9e1b447641b61028d0b94b077adf27c.scope,task=python3,pid=485850,uid=33

which is called oom_regex_kubernetes_containerd_systemd_cgroup and tested it matched:
https://fluffy.yelpcorp.com/i/zTVqjL76X4tQDRG4wbpMdcX83BcxP7R8.html

NOTE

we need to make sure we merge the paasta pr first and have it rolled out everywhere before merging the puppet pr otherwise syslog wont send the logs as we have seen in testing where we had to first install the paasta debian package before running the d-branch to have syslog send the logs to scribereader

nemacysts · 2024-06-27T21:45:58Z

paasta_tools/monitoring/kill_orphaned_docker_containers.py

note: we'll want to remove any references of this from puppet if we haven't already

We'll also need to remove this from setup.py and debian/paasta-tools.links I believe.

That said, I don't think you'll need to do anything in puppet -- when I looked through for references to this script I found references to a script that had a similar name, but this specific paasta version was never actually referenced/used.

nemacysts · 2024-06-27T21:46:16Z

paasta_tools/oom_logger.py

+    parser.add_argument(
+        "--containerd",
+        action="store_true",
+        help="Use containerd to inspect containers",


nit:

Suggested change

help="Use containerd to inspect containers",

help="Use containerd to inspect containers, otherwise use docker",

nemacysts · 2024-06-27T21:50:14Z

paasta_tools/oom_logger.py

@@ -136,11 +151,15 @@ def capture_oom_events_from_stdin():
                break


-def get_container_env_as_dict(docker_inspect):
+def get_container_env_as_dict(is_cri_containerd: bool, container_inspect):


nit:

Suggested change

def get_container_env_as_dict(is_cri_containerd: bool, container_inspect):

def get_container_env_as_dict(is_cri_containerd: bool, container_inspect: dict) -> Dict[str, str]:

comment: typing container_inspect is probably a PITA if the containerd/docker libraries don't have a good TypedDict/dataclass/etc that we can use - maybe once we're only using containerd we can have a non-total TypedDict with the bits that we do use tho :p

nemacysts · 2024-06-27T21:55:17Z

paasta_tools/oom_logger.py

+        if is_cri_containerd:
+            namespace = "k8s.io"
+        else:
+            namespace = "moby"


hmm, could we use the containerd library for both cases? if so, we could remove the conditional in main() :)

otherwise, we can just assume that if this method is being called we want the k8s namespace

yea, I was thinking of also just assuming if we call this function then we are using containerd but I wasn't sure if we would still need the moby ns

jfongatyelp · 2024-06-27T22:24:44Z

paasta_tools/monitoring/kill_orphaned_docker_containers.py

We'll also need to remove this from setup.py and debian/paasta-tools.links I believe.

That said, I don't think you'll need to do anything in puppet -- when I looked through for references to this script I found references to a script that had a similar name, but this specific paasta version was never actually referenced/used.

jfongatyelp · 2024-06-27T22:25:36Z

paasta_tools/oom_logger.py

 def capture_oom_events_from_stdin():
    process_name_regex = re.compile(
        r"^\d+\s[a-zA-Z0-9\-]+\s.*\]\s(.+)\sinvoked\soom-killer:"


I think we'll need to do some experimentation w/ an actual oomkilled pod on nodes w/ containerd to verify what format the logs look like, and if we need to update any of these regexes.

paasta_tools/oom_logger.py

…erd-cri oom

This reverts commit 52b62c8.

This reverts commit 8a9f0d1.

This reverts commit aa1df40.

This reverts commit 52b62c8.

…947' into u/emanelsabban/COMPINFRA-3947

jfongatyelp

python dependencies are pain 🫨

jfongatyelp · 2024-07-12T22:16:23Z

paasta_tools/oom_logger.py

+    if is_cri_containerd:
+        config = container_inspect.get("process")
+        env = config.get("env", [])
+    else:
+        config = container_inspect.get("Config")
        env = config.get("Env", [])
+    if config is not None:


Technically this if config is not None should never be hit if we've got that far; if config was none we'd run into an exception when reaching into it while trying to get the env in each of the cases above.
Maybe this should be more like:

Suggested change

if is_cri_containerd:

config = container_inspect.get("process")

env = config.get("env", [])

else:

config = container_inspect.get("Config")

env = config.get("Env", [])

if config is not None:

if is_cri_containerd:

config = container_inspect.get("process")

env_key = "env"

else:

config = container_inspect.get("Config")

env_key = "Env"

if config is not None:

env = config.get(env_key, [])

...

nemacysts

we should add some tests for the new functionality, but otherwise looks good :)

nemacysts · 2024-07-23T16:07:26Z

paasta_tools/oom_logger.py

@@ -115,6 +150,8 @@ def capture_oom_events_from_stdin():
        oom_regex_kubernetes,
        oom_regex_kubernetes_structured,
        oom_regex_kubernetes_systemd_cgroup,
+        oom_regex_kubernetes_containerd_systemd_cgroup,
+        oom_regex_kubernetes_containerd_systemd_cgroup_structured,
    ]


at some point we should probably audit all of these + remove unused ones + comment the remaining ones

from chatting in slack, it seems like we're not quite sure yet why sometimes we're only seeing the structured oomkill loglines and other times the unstructured one - but it does seem a bit weird that we can't use a single regex for containerd pods (or that in the past we needed 3 regexes for dockershim pods!)

nemacysts · 2024-07-23T16:19:01Z

paasta_tools/oom_logger.py

@@ -136,11 +173,18 @@ def capture_oom_events_from_stdin():
                break


-def get_container_env_as_dict(docker_inspect):
+def get_container_env_as_dict(
+    is_cri_containerd: bool, container_inspect: dict


might be nice to have some TypedDicts for the container_inspect dict - but that'll be a little annoying since you'd need a Union of two TypedDicts and you'd need to add some casts in the the code below :p

nemacysts · 2024-07-23T16:22:20Z

tests/test_oom_logger.py

+@patch("paasta_tools.oom_logger.argparse", autospec=True)
+def test_parse_args(mock_argparse):
+    assert parse_args() == mock_argparse.ArgumentParser.return_value.parse_args()


i think it's fine to leave this untested since we're not really checking for much here :)

tests/test_oom_logger.py

nemacysts · 2024-07-23T16:26:00Z

tox.ini

@@ -12,6 +12,7 @@ passenv = SSH_AUTH_SOCK PAASTA_ENV DOCKER_HOST CI
 setenv =
    TZ = UTC
 deps =
+    --only-binary=grpcio


i wonder if there's a way for us to set this in the requirements file so that we only need to specify this once - i was going to suggest adding a comment as to why this is here would be nice (i.e., we have wheels readily available internally but the same is not true in public pypi), but we'd need to copy-paste that quite a bit :p

oh nice, it is possible: https://pip.pypa.io/en/stable/reference/requirements-file-format/#global-options

if it doesn't break requirements-tools (i.e., check-requirements) to do this, that'd make this diff smaller (we wouldn't have to add this --only-binary option to all the toxenvs nor to the dockerfiles!) and it would let us add a comment explaining why we're singling out grpcio

(that said, also fine as a followup)

nemacysts · 2024-07-23T16:28:35Z

tox.ini

+setenv =
+    PIP_INDEX_URL = http://169.254.255.254:20641/simple/


we should probably refactor our env handling at some point so that this isn't required here, but that's probably not something we wanna tackle in this PR

nemacysts

minor nits, but looks good otherwise!

nemacysts · 2024-08-28T15:16:58Z

tests/test_oom_logger.py

+                "PAASTA_SERVICE=fake_service",
+                "PAASTA_INSTANCE=fake_instance",
+                "PAASTA_RESOURCE_MEM=512",
+                "MESOS_CONTAINER_NAME=mesos-a04c14a6-83ea-4047-a802-92b850b1624e",


a more representative test would probably be to remove this env var

Suggested change

"MESOS_CONTAINER_NAME=mesos-a04c14a6-83ea-4047-a802-92b850b1624e",

since we no longer actually set this :)

nemacysts · 2024-08-28T15:17:28Z

tests/test_oom_logger.py

+        service="fake_service",
+        instance="fake_instance",
+        process_name="python3",
+        mesos_container_id="mesos-a04c14a6-83ea-4047-a802-92b850b1624e",


same here, but

Suggested change

mesos_container_id="mesos-a04c14a6-83ea-4047-a802-92b850b1624e",

mesos_container_id="mesos-null",

instead of deleting this line

nemacysts · 2024-08-28T15:18:58Z

tests/test_oom_logger.py

+def log_line_containerd():
+    return LogLine(
+        timestamp=1720128512,
+        hostname="dev208-uswest1adevc",


just to make sure that there's nothing in this test that depends on where we run it - can we re-use the hostname from previous tests? i.e.,,

Suggested change

hostname="dev208-uswest1adevc",

hostname="dev37-devc",

EmanElsaban requested review from nemacysts, jfongatyelp, ankit864, ajayOO8 and ilkinmammadzada June 27, 2024 21:26

EmanElsaban force-pushed the u/emanelsabban/COMPINFRA-3947 branch from 39e71ad to cedce50 Compare June 27, 2024 21:30

Adding containerd compatability to oom_logger

a978a3b

EmanElsaban force-pushed the u/emanelsabban/COMPINFRA-3947 branch from cedce50 to a978a3b Compare June 27, 2024 21:32

nemacysts reviewed Jun 27, 2024

View reviewed changes

jfongatyelp reviewed Jun 27, 2024

View reviewed changes

paasta_tools/oom_logger.py Show resolved Hide resolved

Addressing reviews + adding new regex for containerd

070a761

EmanElsaban requested review from wilmer05, jfongatyelp and nemacysts July 4, 2024 20:17

EmanElsaban and others added 15 commits July 4, 2024 14:31

Adding in addition to the nerdctl regex a regex for capturing contain…

ba53ddb

…erd-cri oom

Updating packages for the k8s_itest to pass

52b62c8

use wheels + main internal pypi

16d092d

Revert "Updating packages for the k8s_itest to pass"

2173227

This reverts commit 52b62c8.

update addict

aa1df40

update argcomplete

8a9f0d1

Revert "update argcomplete"

e609b71

This reverts commit 8a9f0d1.

Revert "update addict"

07dde6d

This reverts commit aa1df40.

prefer binary

b79ffdf

prefer binary harder

3418ca2

prefer binary this way

cbd610f

maybe?

048a1a8

missed a spot

0ba8fba

upgrade??

b6efc9f

do we still need deadsnakes here

c72a4ed

nemacysts and others added 8 commits July 9, 2024 13:30

anotha one

c4ffc3f

more fixes

447f321

deadsnakes

0e2f1db

distutils is fun

47ed5a4

cleanup

3f6395b

Revert "Updating packages for the k8s_itest to pass"

0818402

This reverts commit 52b62c8.

Merge remote-tracking branch 'origin/luisp-u/emanelsabban/COMPINFRA-3…

aa108c1

…947' into u/emanelsabban/COMPINFRA-3947

Removed the nerdctl regex and the adjusted ones for docker

a9299bd

jfongatyelp approved these changes Jul 12, 2024

View reviewed changes

Address getting from config when its none

96ce384

EmanElsaban requested a review from jfongatyelp July 15, 2024 19:40

Add a less structured regex for containerd

14a16bb

nemacysts reviewed Jul 23, 2024

View reviewed changes

EmanElsaban added 2 commits August 27, 2024 05:13

Added two tests to test the regex if its working

4132b65

Adding a unit test for testing main with containerd=true

f3c13bc

EmanElsaban requested a review from nemacysts August 27, 2024 15:37

nemacysts reviewed Aug 28, 2024

View reviewed changes

Addressing more reviews

416f9b9

EmanElsaban requested a review from nemacysts August 28, 2024 17:59

nemacysts approved these changes Aug 28, 2024

View reviewed changes

EmanElsaban merged commit f2cc8fc into master Aug 29, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding containerd compatability to oom_logger - COMPINFRA-3947 #3904

Adding containerd compatability to oom_logger - COMPINFRA-3947 #3904

EmanElsaban commented Jun 27, 2024 •

edited

Loading

nemacysts Jun 27, 2024

jfongatyelp Jun 27, 2024

nemacysts Jun 27, 2024

nemacysts Jun 27, 2024

nemacysts Jun 27, 2024

EmanElsaban Jun 28, 2024

jfongatyelp Jun 27, 2024

jfongatyelp Jun 27, 2024

jfongatyelp left a comment

jfongatyelp Jul 12, 2024

nemacysts left a comment

nemacysts Jul 23, 2024

nemacysts Jul 23, 2024

nemacysts Jul 23, 2024

nemacysts Jul 23, 2024

nemacysts Jul 23, 2024

nemacysts Jul 29, 2024

nemacysts Jul 23, 2024

nemacysts left a comment

nemacysts Aug 28, 2024

nemacysts Aug 28, 2024

nemacysts Aug 28, 2024

	help="Use containerd to inspect containers",
	help="Use containerd to inspect containers, otherwise use docker",

	def get_container_env_as_dict(is_cri_containerd: bool, container_inspect):
	def get_container_env_as_dict(is_cri_containerd: bool, container_inspect: dict) -> Dict[str, str]:

		setenv =
		PIP_INDEX_URL = http://169.254.255.254:20641/simple/

	mesos_container_id="mesos-a04c14a6-83ea-4047-a802-92b850b1624e",
	mesos_container_id="mesos-null",

Adding containerd compatability to oom_logger - COMPINFRA-3947 #3904

Adding containerd compatability to oom_logger - COMPINFRA-3947 #3904

Conversation

EmanElsaban commented Jun 27, 2024 • edited Loading

NOTE

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfongatyelp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nemacysts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nemacysts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EmanElsaban commented Jun 27, 2024 •

edited

Loading