SS 1091 Fix handling of deleted pods in a release #4

alfredeen · 2024-08-23T11:06:55Z

This PR fixes a bug that incorrectly sets the app status to Deleted. This can occur in several scenarios for example if the app image is changed to an invalid image and then back to a valid image.

A function to fetch the current status directly from k8s via the k8s API client is introduced. This is used in special situations when the k8s stream seems to indicate that a pod has been deleted to make sure this is the case.

Also introduces a CI action and extended unit test coverage of the app status logic.

…to invalid image back to valid image.

…d status objects to app status codes.

…iour

…c events

…tatus Deleted.

churnikov · 2024-08-23T11:32:56Z

Just a general thought about logging. We have correlation id enabled in serve. We could pass it here as well for when we would be looking for logs, we would be able to trace the whole chain of events even here.

serve_event_listener/status_data.py

churnikov · 2024-08-23T11:36:03Z

tests/test_status_data.py

+        self.assertEqual(
+            self.status_data.status_data[release].get("status"),
+            "Running",
+            f"Release should be Running after delete of first pod, \
+                         ts pod deleted={self.pod.metadata.deletion_timestamp} vs \
+                         ts invalid_pod deleted={self.invalid_pod.metadata.deletion_timestamp} vs \
+                         ts valid_pod created={self.valid_pod.metadata.creation_timestamp}, {msg}",
+        )


Just curious, why did you switch to unittests style asserts? assert has a third argument as well, if that was the reason

This was the pattern already used by Viktor which I chose to continue

churnikov · 2024-08-23T11:58:40Z

tests/test_status_data.py

@@ -74,19 +75,222 @@ def test_replica_scenario(self):
        self.new_pod.create(release)
        self.status_data.update({"object": self.new_pod})

+        time.sleep(0.01)


Does it actually create objects on the cluster, you've added these sleep calls here?)

No objects are created on the cluster, just in memory k8s client objects. Without these short pauses the object timestamps were exactly equal which would be exceptionally rare in the real world and were problematic for the implementation.

churnikov · 2024-08-23T12:03:31Z

tests/test_status_data.py

+    def test_waiting_container_reason_pending(self):
+        """
+        This scenario tests a k8s pod status object with a container with the following status attributes:
+        state=waiting, reason=PodInitializing
+        """
+        podstatus = PodStatus()
+        podstatus.add_container_status("waiting", "PodInitializing")
+        expected = ("Pending", "", "")
+        actual = StatusData.determine_status_from_k8s(podstatus)
+        self.assertEqual(actual, expected)


This and the tests bellow are perfect case for a parameterized test cases. It would reduce test file size quite a bit I think
But if you want to stick to unittest, then there is a subTest context manager that could be used for this

I see your point but the tests are not so similar as they might seem at first. There are different methods for init containers and non-init containers, and one test adds both objects. So parameterized tests could combine at most 2 or 3 tests into one.

Co-authored-by: Nikita Churikov <[email protected]>

alfredeen added 13 commits August 15, 2024 12:04

Adding CI action with linting and tests

c5a9cc7

Minor CI action improvements

c907ce5

Aligned CI black linter with this repos set line length

d8c72bf

Minor corrections to spelling

c196b01

More logging and corrected some comments

bedddba

Added a unit test for scenario of editing a release with valid image …

a9308c9

…to invalid image back to valid image.

Refactoring status data to prepare for better unit test coverage.

efb0e3b

Began unit test of method to convert k8s container status to app status.

370cfd4

Added several unit test scenarios to verify the translation of k8s po…

def58da

…d status objects to app status codes.

Added some delays between pod events to simulate more realistic behav…

67f62b9

…iour

Limited python version to 3.12

af3d0d3

Added method to fetch status directly from k8s to be used at strategi…

20c8d09

…c events

Fixed return of tuple methods. Releases with no pods are now set to s…

1a60ab6

…tatus Deleted.

alfredeen self-assigned this Aug 23, 2024

alfredeen requested a review from a team August 23, 2024 11:11

alfredeen marked this pull request as ready for review August 23, 2024 11:11

churnikov reviewed Aug 23, 2024

View reviewed changes

Update serve_event_listener/status_data.py

2fe018a

Co-authored-by: Nikita Churikov <[email protected]>

churnikov approved these changes Aug 26, 2024

View reviewed changes

alfredeen merged commit dea097e into main Aug 26, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SS 1091 Fix handling of deleted pods in a release #4

SS 1091 Fix handling of deleted pods in a release #4

alfredeen commented Aug 23, 2024 •

edited

Loading

churnikov commented Aug 23, 2024

churnikov Aug 23, 2024

alfredeen Aug 23, 2024

churnikov Aug 23, 2024

alfredeen Aug 23, 2024

churnikov Aug 23, 2024

alfredeen Aug 23, 2024

SS 1091 Fix handling of deleted pods in a release #4

SS 1091 Fix handling of deleted pods in a release #4

Conversation

alfredeen commented Aug 23, 2024 • edited Loading

churnikov commented Aug 23, 2024

churnikov Aug 23, 2024

Choose a reason for hiding this comment

alfredeen Aug 23, 2024

Choose a reason for hiding this comment

churnikov Aug 23, 2024

Choose a reason for hiding this comment

alfredeen Aug 23, 2024

Choose a reason for hiding this comment

churnikov Aug 23, 2024

Choose a reason for hiding this comment

alfredeen Aug 23, 2024

Choose a reason for hiding this comment

alfredeen commented Aug 23, 2024 •

edited

Loading