-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding containerd compatability to oom_logger - COMPINFRA-3947 #3904
Conversation
39e71ad
to
cedce50
Compare
cedce50
to
a978a3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: we'll want to remove any references of this from puppet if we haven't already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also need to remove this from setup.py and debian/paasta-tools.links I believe.
That said, I don't think you'll need to do anything in puppet -- when I looked through for references to this script I found references to a script that had a similar name, but this specific paasta version was never actually referenced/used.
paasta_tools/oom_logger.py
Outdated
parser.add_argument( | ||
"--containerd", | ||
action="store_true", | ||
help="Use containerd to inspect containers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
help="Use containerd to inspect containers", | |
help="Use containerd to inspect containers, otherwise use docker", |
paasta_tools/oom_logger.py
Outdated
@@ -136,11 +151,15 @@ def capture_oom_events_from_stdin(): | |||
break | |||
|
|||
|
|||
def get_container_env_as_dict(docker_inspect): | |||
def get_container_env_as_dict(is_cri_containerd: bool, container_inspect): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
def get_container_env_as_dict(is_cri_containerd: bool, container_inspect): | |
def get_container_env_as_dict(is_cri_containerd: bool, container_inspect: dict) -> Dict[str, str]: |
comment: typing container_inspect
is probably a PITA if the containerd/docker libraries don't have a good TypedDict/dataclass/etc that we can use - maybe once we're only using containerd we can have a non-total TypedDict with the bits that we do use tho :p
paasta_tools/oom_logger.py
Outdated
if is_cri_containerd: | ||
namespace = "k8s.io" | ||
else: | ||
namespace = "moby" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, could we use the containerd library for both cases? if so, we could remove the conditional in main()
:)
otherwise, we can just assume that if this method is being called we want the k8s namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I was thinking of also just assuming if we call this function then we are using containerd but I wasn't sure if we would still need the moby ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also need to remove this from setup.py and debian/paasta-tools.links I believe.
That said, I don't think you'll need to do anything in puppet -- when I looked through for references to this script I found references to a script that had a similar name, but this specific paasta version was never actually referenced/used.
def capture_oom_events_from_stdin(): | ||
process_name_regex = re.compile( | ||
r"^\d+\s[a-zA-Z0-9\-]+\s.*\]\s(.+)\sinvoked\soom-killer:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need to do some experimentation w/ an actual oomkilled pod on nodes w/ containerd to verify what format the logs look like, and if we need to update any of these regexes.
This reverts commit 52b62c8.
…947' into u/emanelsabban/COMPINFRA-3947
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python dependencies are pain 🫨
paasta_tools/oom_logger.py
Outdated
if is_cri_containerd: | ||
config = container_inspect.get("process") | ||
env = config.get("env", []) | ||
else: | ||
config = container_inspect.get("Config") | ||
env = config.get("Env", []) | ||
if config is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this if config is not None
should never be hit if we've got that far; if config
was none we'd run into an exception when reaching into it while trying to get the env
in each of the cases above.
Maybe this should be more like:
if is_cri_containerd: | |
config = container_inspect.get("process") | |
env = config.get("env", []) | |
else: | |
config = container_inspect.get("Config") | |
env = config.get("Env", []) | |
if config is not None: | |
if is_cri_containerd: | |
config = container_inspect.get("process") | |
env_key = "env" | |
else: | |
config = container_inspect.get("Config") | |
env_key = "Env" | |
if config is not None: | |
env = config.get(env_key, []) | |
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add some tests for the new functionality, but otherwise looks good :)
@@ -115,6 +150,8 @@ def capture_oom_events_from_stdin(): | |||
oom_regex_kubernetes, | |||
oom_regex_kubernetes_structured, | |||
oom_regex_kubernetes_systemd_cgroup, | |||
oom_regex_kubernetes_containerd_systemd_cgroup, | |||
oom_regex_kubernetes_containerd_systemd_cgroup_structured, | |||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at some point we should probably audit all of these + remove unused ones + comment the remaining ones
from chatting in slack, it seems like we're not quite sure yet why sometimes we're only seeing the structured oomkill loglines and other times the unstructured one - but it does seem a bit weird that we can't use a single regex for containerd pods (or that in the past we needed 3 regexes for dockershim pods!)
@@ -136,11 +173,18 @@ def capture_oom_events_from_stdin(): | |||
break | |||
|
|||
|
|||
def get_container_env_as_dict(docker_inspect): | |||
def get_container_env_as_dict( | |||
is_cri_containerd: bool, container_inspect: dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be nice to have some TypedDicts for the container_inspect dict - but that'll be a little annoying since you'd need a Union of two TypedDicts and you'd need to add some casts in the the code below :p
tests/test_oom_logger.py
Outdated
@patch("paasta_tools.oom_logger.argparse", autospec=True) | ||
def test_parse_args(mock_argparse): | ||
assert parse_args() == mock_argparse.ArgumentParser.return_value.parse_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's fine to leave this untested since we're not really checking for much here :)
@@ -12,6 +12,7 @@ passenv = SSH_AUTH_SOCK PAASTA_ENV DOCKER_HOST CI | |||
setenv = | |||
TZ = UTC | |||
deps = | |||
--only-binary=grpcio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if there's a way for us to set this in the requirements file so that we only need to specify this once - i was going to suggest adding a comment as to why this is here would be nice (i.e., we have wheels readily available internally but the same is not true in public pypi), but we'd need to copy-paste that quite a bit :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice, it is possible: https://pip.pypa.io/en/stable/reference/requirements-file-format/#global-options
if it doesn't break requirements-tools (i.e., check-requirements) to do this, that'd make this diff smaller (we wouldn't have to add this --only-binary option to all the toxenvs nor to the dockerfiles!) and it would let us add a comment explaining why we're singling out grpcio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(that said, also fine as a followup)
setenv = | ||
PIP_INDEX_URL = http://169.254.255.254:20641/simple/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably refactor our env handling at some point so that this isn't required here, but that's probably not something we wanna tackle in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nits, but looks good otherwise!
tests/test_oom_logger.py
Outdated
"PAASTA_SERVICE=fake_service", | ||
"PAASTA_INSTANCE=fake_instance", | ||
"PAASTA_RESOURCE_MEM=512", | ||
"MESOS_CONTAINER_NAME=mesos-a04c14a6-83ea-4047-a802-92b850b1624e", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a more representative test would probably be to remove this env var
"MESOS_CONTAINER_NAME=mesos-a04c14a6-83ea-4047-a802-92b850b1624e", |
since we no longer actually set this :)
tests/test_oom_logger.py
Outdated
service="fake_service", | ||
instance="fake_instance", | ||
process_name="python3", | ||
mesos_container_id="mesos-a04c14a6-83ea-4047-a802-92b850b1624e", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, but
mesos_container_id="mesos-a04c14a6-83ea-4047-a802-92b850b1624e", | |
mesos_container_id="mesos-null", |
instead of deleting this line
tests/test_oom_logger.py
Outdated
def log_line_containerd(): | ||
return LogLine( | ||
timestamp=1720128512, | ||
hostname="dev208-uswest1adevc", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to make sure that there's nothing in this test that depends on where we run it - can we re-use the hostname from previous tests? i.e.,,
hostname="dev208-uswest1adevc", | |
hostname="dev37-devc", |
Adds containerd support to oom_logger in paasta_tools.
[Testing]
oom_regex_kubernetes
andoom_regex_kubernetes_structured
. We see the log in the stream. This verifies that the Containerd Get request is doing the right thing and that we are getting the passed environment variables https://fluffy.yelpcorp.com/i/CMPWkJ7Fczg6ZTKwQJg8BZBqwTF4BkM9.htmlwhich is called
oom_regex_kubernetes_containerd_systemd_cgroup
and tested it matched:https://fluffy.yelpcorp.com/i/zTVqjL76X4tQDRG4wbpMdcX83BcxP7R8.html
NOTE
we need to make sure we merge the paasta pr first and have it rolled out everywhere before merging the puppet pr otherwise syslog wont send the logs as we have seen in testing where we had to first install the paasta debian package before running the d-branch to have syslog send the logs to scribereader