Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding containerd compatability to oom_logger - COMPINFRA-3947 #3904

Merged
merged 30 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a978a3b
Adding containerd compatability to oom_logger
EmanElsaban Jun 27, 2024
070a761
Addressing reviews + adding new regex for containerd
EmanElsaban Jul 4, 2024
ba53ddb
Adding in addition to the nerdctl regex a regex for capturing contain…
EmanElsaban Jul 4, 2024
52b62c8
Updating packages for the k8s_itest to pass
EmanElsaban Jul 9, 2024
16d092d
use wheels + main internal pypi
nemacysts Jul 9, 2024
2173227
Revert "Updating packages for the k8s_itest to pass"
nemacysts Jul 9, 2024
aa1df40
update addict
nemacysts Jul 9, 2024
8a9f0d1
update argcomplete
nemacysts Jul 9, 2024
e609b71
Revert "update argcomplete"
nemacysts Jul 9, 2024
07dde6d
Revert "update addict"
nemacysts Jul 9, 2024
b79ffdf
prefer binary
nemacysts Jul 9, 2024
3418ca2
prefer binary harder
nemacysts Jul 9, 2024
cbd610f
prefer binary this way
nemacysts Jul 9, 2024
048a1a8
maybe?
nemacysts Jul 9, 2024
0ba8fba
missed a spot
nemacysts Jul 9, 2024
b6efc9f
upgrade??
nemacysts Jul 9, 2024
c72a4ed
do we still need deadsnakes here
nemacysts Jul 9, 2024
c4ffc3f
anotha one
nemacysts Jul 9, 2024
447f321
more fixes
nemacysts Jul 9, 2024
0e2f1db
deadsnakes
nemacysts Jul 9, 2024
47ed5a4
distutils is fun
nemacysts Jul 9, 2024
3f6395b
cleanup
nemacysts Jul 9, 2024
0818402
Revert "Updating packages for the k8s_itest to pass"
EmanElsaban Jul 10, 2024
aa108c1
Merge remote-tracking branch 'origin/luisp-u/emanelsabban/COMPINFRA-3…
EmanElsaban Jul 10, 2024
a9299bd
Removed the nerdctl regex and the adjusted ones for docker
EmanElsaban Jul 10, 2024
96ce384
Address getting from config when its none
EmanElsaban Jul 15, 2024
14a16bb
Add a less structured regex for containerd
EmanElsaban Jul 16, 2024
4132b65
Added two tests to test the regex if its working
EmanElsaban Aug 27, 2024
f3c13bc
Adding a unit test for testing main with containerd=true
EmanElsaban Aug 27, 2024
416f9b9
Addressing more reviews
EmanElsaban Aug 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 0 additions & 72 deletions paasta_tools/monitoring/kill_orphaned_docker_containers.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: we'll want to remove any references of this from puppet if we haven't already

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also need to remove this from setup.py and debian/paasta-tools.links I believe.

That said, I don't think you'll need to do anything in puppet -- when I looked through for references to this script I found references to a script that had a similar name, but this specific paasta version was never actually referenced/used.

This file was deleted.

60 changes: 50 additions & 10 deletions paasta_tools/oom_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,15 @@
destination(paasta_oom_logger);
};
"""
import argparse
import json
import re
import sys
from collections import namedtuple

import grpc
from containerd.services.containers.v1 import containers_pb2
from containerd.services.containers.v1 import containers_pb2_grpc
from docker.errors import APIError

from paasta_tools.cli.utils import get_instance_config
Expand Down Expand Up @@ -76,6 +81,16 @@
)


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="paasta_oom_logger")
parser.add_argument(
"--containerd",
action="store_true",
help="Use containerd to inspect containers",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
help="Use containerd to inspect containers",
help="Use containerd to inspect containers, otherwise use docker",

)
return parser.parse_args()


def capture_oom_events_from_stdin():
process_name_regex = re.compile(
r"^\d+\s[a-zA-Z0-9\-]+\s.*\]\s(.+)\sinvoked\soom-killer:"
Comment on lines 95 to 97
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to do some experimentation w/ an actual oomkilled pod on nodes w/ containerd to verify what format the logs look like, and if we need to update any of these regexes.

Expand Down Expand Up @@ -136,11 +151,15 @@ def capture_oom_events_from_stdin():
break


def get_container_env_as_dict(docker_inspect):
def get_container_env_as_dict(is_cri_containerd: bool, container_inspect):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
def get_container_env_as_dict(is_cri_containerd: bool, container_inspect):
def get_container_env_as_dict(is_cri_containerd: bool, container_inspect: dict) -> Dict[str, str]:

comment: typing container_inspect is probably a PITA if the containerd/docker libraries don't have a good TypedDict/dataclass/etc that we can use - maybe once we're only using containerd we can have a non-total TypedDict with the bits that we do use tho :p

env_vars = {}
config = docker_inspect.get("Config")
if config is not None:
if is_cri_containerd:
config = container_inspect.get("process")
env = config.get("env", [])
else:
config = container_inspect.get("Config")
env = config.get("Env", [])
if config is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this if config is not None should never be hit if we've got that far; if config was none we'd run into an exception when reaching into it while trying to get the env in each of the cases above.
Maybe this should be more like:

Suggested change
if is_cri_containerd:
config = container_inspect.get("process")
env = config.get("env", [])
else:
config = container_inspect.get("Config")
env = config.get("Env", [])
if config is not None:
if is_cri_containerd:
config = container_inspect.get("process")
env_key = "env"
else:
config = container_inspect.get("Config")
env_key = "Env"
if config is not None:
env = config.get(env_key, [])
...

for i in env:
name, _, value = i.partition("=")
env_vars[name] = value
Expand Down Expand Up @@ -209,18 +228,32 @@ def send_sfx_event(service, instance, cluster):
counter.count()


def get_containerd_container(
is_cri_containerd: bool, container_id: str
) -> containers_pb2.Container:
with grpc.insecure_channel("unix:///run/containerd/containerd.sock") as channel:
containersv1 = containers_pb2_grpc.ContainersStub(channel)
if is_cri_containerd:
namespace = "k8s.io"
else:
namespace = "moby"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, could we use the containerd library for both cases? if so, we could remove the conditional in main() :)

otherwise, we can just assume that if this method is being called we want the k8s namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I was thinking of also just assuming if we call this function then we are using containerd but I wasn't sure if we would still need the moby ns

return containersv1.Get(
containers_pb2.GetContainerRequest(id=container_id),
metadata=(("containerd-namespace", namespace),),
).container


def main():
if clog is None:
print("CLog logger unavailable, exiting.", file=sys.stderr)
sys.exit(1)

args = parse_args()
clog.config.configure(
scribe_host="169.254.255.254",
scribe_port=1463,
monk_disable=False,
scribe_disable=False,
)

cluster = load_system_paasta_config().get_cluster()
client = get_docker_client()
for (
Expand All @@ -229,11 +262,18 @@ def main():
container_id,
process_name,
) in capture_oom_events_from_stdin():
try:
docker_inspect = client.inspect_container(resource_id=container_id)
except (APIError):
continue
env_vars = get_container_env_as_dict(docker_inspect)
if args.containerd:
# then we're using containerd to inspect containers
container_info = get_containerd_container(args.containerd, container_id)
container_spec_raw = container_info.spec.value.decode("utf-8")
container_inspect = json.loads(container_spec_raw)
else:
# we're using docker to inspect containers
try:
container_inspect = client.inspect_container(resource_id=container_id)
except (APIError):
continue
jfongatyelp marked this conversation as resolved.
Show resolved Hide resolved
env_vars = get_container_env_as_dict(args.containerd, container_inspect)
service = env_vars.get("PAASTA_SERVICE", "unknown")
instance = env_vars.get("PAASTA_INSTANCE", "unknown")
mesos_container_id = env_vars.get("MESOS_CONTAINER_NAME", "mesos-null")
Expand Down
2 changes: 2 additions & 0 deletions requirements-minimal.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,14 @@ botocore
bravado >= 10.2.0
certifi
choice >= 0.1
containerd
cookiecutter >= 1.4.0
croniter
docker
dulwich >= 0.17.3
ephemeral-port-reserve >= 1.0.1
graphviz
grpcio
gunicorn
humanfriendly
humanize >= 0.5.1
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ certifi==2017.11.5
chardet==3.0.4
choice==0.1
click==6.6
containerd==1.5.3
cookiecutter==1.4.0
croniter==1.3.4
decorator==4.1.2
Expand All @@ -27,6 +28,7 @@ ephemeral-port-reserve==1.1.0
future==0.16.0
google-auth==1.2.0
graphviz==0.8.2
grpcio==1.62.2
gunicorn==19.8.1
http-parser==0.9.0
humanfriendly==4.18
Expand Down
9 changes: 9 additions & 0 deletions tests/test_oom_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from paasta_tools.oom_logger import log_to_clog
from paasta_tools.oom_logger import LogLine
from paasta_tools.oom_logger import main
from paasta_tools.oom_logger import parse_args
from paasta_tools.oom_logger import send_sfx_event


Expand Down Expand Up @@ -344,14 +345,21 @@ def test_send_sfx_event(mock_get_instance_config):
assert mock_meteorite.create_counter.return_value.count.call_count == 1


@patch("paasta_tools.oom_logger.argparse", autospec=True)
def test_parse_args(mock_argparse):
assert parse_args() == mock_argparse.ArgumentParser.return_value.parse_args()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's fine to leave this untested since we're not really checking for much here :)



@patch("paasta_tools.oom_logger.sys.stdin", autospec=True)
@patch("paasta_tools.oom_logger.clog", autospec=True)
@patch("paasta_tools.oom_logger.send_sfx_event", autospec=True)
@patch("paasta_tools.oom_logger.load_system_paasta_config", autospec=True)
@patch("paasta_tools.oom_logger.log_to_clog", autospec=True)
@patch("paasta_tools.oom_logger.log_to_paasta", autospec=True)
@patch("paasta_tools.oom_logger.get_docker_client", autospec=True)
@patch("paasta_tools.oom_logger.parse_args", autospec=True)
def test_main(
mock_parse_args,
mock_get_docker_client,
mock_log_to_paasta,
mock_log_to_clog,
Expand All @@ -365,6 +373,7 @@ def test_main(
):

mock_sys_stdin.readline.side_effect = sys_stdin
mock_parse_args.return_value.containerd = False
nemacysts marked this conversation as resolved.
Show resolved Hide resolved
docker_client = Mock(inspect_container=Mock(return_value=docker_inspect))
mock_get_docker_client.return_value = docker_client
mock_load_system_paasta_config.return_value.get_cluster.return_value = (
Expand Down
Loading