From 2191cf338605692bbb2ec074dca8f0a64e76cb02 Mon Sep 17 00:00:00 2001 From: Poornima Krishnasamy Date: Tue, 28 May 2024 18:51:23 +0100 Subject: [PATCH 1/9] Add Support squad duties to How we work runbook --- runbooks/source/how-we-work.html.md.erb | 40 +++++++++++++++---------- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/runbooks/source/how-we-work.html.md.erb b/runbooks/source/how-we-work.html.md.erb index 92cd905c..c41ee90a 100644 --- a/runbooks/source/how-we-work.html.md.erb +++ b/runbooks/source/how-we-work.html.md.erb @@ -1,7 +1,7 @@ --- title: How We Work weight: 10 -last_reviewed_on: 2024-02-14 +last_reviewed_on: 2024-05-28 review_in: 3 months --- @@ -49,7 +49,7 @@ Stories are estimated with story points based on **complexity** during planning ## The Board / Tickets -We use a kanban process to manage our backlog of work on this [zenhub board], which aggregates GitHub Issues from the various CP team repositories. +We use a kanban process to manage our backlog of work on this [Github Project board], which aggregates GitHub Issues from the various CP team repositories. During the sprint, the process of getting work done should look like this: @@ -81,7 +81,24 @@ Please read these [technical guidelines](https://ministryofjustice.github.io/tec * Please be pro-active about reviewing other team members' PRs * When reviewing a PR, please add a "reaction" emoji to the corresponding slack message, so that other team members know you're doing so. This avoids duplicated effort. We tend to use 👀 to show we're reviewing a PR, and/or ✔ when we've approved it. -## The 🔨 Hammer of Justice +## Support Squad +We have a support squad to manage the support requests and alerts that come in. The support squad is responsible for the below in order of priority: + +- Acknowledging and invoking the team to high-priority alerts in the `#high-priority-alarms` slack channel during support hours +- The 🔨 Hammer of Justice + - Responding to queries in the `#ask-cloud-platform` slack channel + - Reviewing PRs raised by users of the Cloud Platform against the [environments repository] +- Acknowledging and responding to alerts in the `#lower-priority-alarms` slack channel which include + - Alerts related to Platform - lower priority alarms which is triggered from Prometheus, AWS, and Pingdom + - Alerts from concourse pipelines related to Integration tests, infrastructure and divergence + - Alerts from concourse pipelines related to [environments repository] i.e apply-namespace, apply-live + - Any other alerts from concourse pipelines +- Support tickets raised by users +- Actions from the How out of date are we? report i.e. (e.g. reviewing documentation pages, or (carefully) destroying orphaned AWS resources) +- Dependabot PRs on the cloud-platform-* repositories +- Any issues from [link checker report] + +### The 🔨 Hammer of Justice The origin of the name is lost, but it sounds a lot more fun than "support manager" 😏 @@ -99,18 +116,10 @@ It **is** the Hammer's job to ensure that all queries are handled, and that PRs > Anyone can (and should) respond to queries in `#ask-cloud-platform`, and review PRs. You don't have to be the Hammer to help. -### Backlog Tickets +#### Backlog Tickets -Working on tickets in the backlog when you're the Hammer is not advised. The constant context switching makes it hard to get significant work done, and there is also the risk that questions go unanswered and PRs get blocked waiting for review because you're head down in a problem and don't notice them. - -Instead, when not answering queries and reviewing PRs, the Hammer should work on fixing "squeaky wheels" - the minor alerts and problems that crop up which don't necessarily result in backlog tickets, or where such tickets never become high-priority enough to get selected during sprint planning. - -"Squeaky wheels" could include things like: - -* Todo items reported by [How out of date are we?] - (e.g. reviewing documentation pages, or (carefully) destroying orphaned AWS resources) -* Intermittent alerts in the `#lower-priority-alarms` slack channel -* Improving our integration tests -* Fixing open issues from Link Checker Report +Working on tickets in the backlog when you're the Hammer is not advised. The constant context switching makes it hard to get significant work done, +and there is also the risk that questions go unanswered and PRs get blocked waiting for review because you're head down in a problem and don't notice them. ## Documentation @@ -123,8 +132,9 @@ It is important to keep all of this up to date as the underlying code changes, s [This page](https://reports.cloud-platform.service.justice.gov.uk/documentation) hosts a list of documents which are overdue for review. Please feel free to review any of the documents listed, and raise a PR making any updates (including updating the `last_reviewed_on` date). -[zenhub board]: https://app.zenhub.com/workspaces/cloud-platform-team-5ccb0b8a81f66118c983c189/board +[Github Project board]: https://github.com/orgs/ministryofjustice/projects/65 [environments repository]: https://github.com/ministryofjustice/cloud-platform-environments [user guide]: https://user-guide.cloud-platform.service.justice.gov.uk [runbooks]: https://runbooks.cloud-platform.service.justice.gov.uk [How out of date are we?]: https://reports.cloud-platform.service.justice.gov.uk/dashboard +[link checker report]: https://github.com/ministryofjustice/cloud-platform/issues?q=is%3Aissue+is%3Aopen+Link+Checker+Report From bf5468fa33d9e6f8fba4190b5f3af68fb8f2a41e Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Tue, 28 May 2024 17:52:09 +0000 Subject: [PATCH 2/9] Commit changes made by code formatters --- runbooks/source/how-we-work.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runbooks/source/how-we-work.html.md.erb b/runbooks/source/how-we-work.html.md.erb index c41ee90a..8678a234 100644 --- a/runbooks/source/how-we-work.html.md.erb +++ b/runbooks/source/how-we-work.html.md.erb @@ -118,7 +118,7 @@ It **is** the Hammer's job to ensure that all queries are handled, and that PRs #### Backlog Tickets -Working on tickets in the backlog when you're the Hammer is not advised. The constant context switching makes it hard to get significant work done, +Working on tickets in the backlog when you're the Hammer is not advised. The constant context switching makes it hard to get significant work done, and there is also the risk that questions go unanswered and PRs get blocked waiting for review because you're head down in a problem and don't notice them. ## Documentation From 41ab81aa3778c171fe76e7bd9c933c2798162194 Mon Sep 17 00:00:00 2001 From: Poornima Krishnasamy Date: Thu, 6 Jun 2024 18:28:33 +0100 Subject: [PATCH 3/9] Add support ticket section --- runbooks/source/how-we-work.html.md.erb | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/runbooks/source/how-we-work.html.md.erb b/runbooks/source/how-we-work.html.md.erb index 8678a234..eb93b00e 100644 --- a/runbooks/source/how-we-work.html.md.erb +++ b/runbooks/source/how-we-work.html.md.erb @@ -1,7 +1,7 @@ --- title: How We Work weight: 10 -last_reviewed_on: 2024-05-28 +last_reviewed_on: 2024-06-06 review_in: 3 months --- @@ -86,7 +86,7 @@ We have a support squad to manage the support requests and alerts that come in. - Acknowledging and invoking the team to high-priority alerts in the `#high-priority-alarms` slack channel during support hours - The 🔨 Hammer of Justice - - Responding to queries in the `#ask-cloud-platform` slack channel + - Ensuring queries in the `#ask-cloud-platform` slack channel are answered - Reviewing PRs raised by users of the Cloud Platform against the [environments repository] - Acknowledging and responding to alerts in the `#lower-priority-alarms` slack channel which include - Alerts related to Platform - lower priority alarms which is triggered from Prometheus, AWS, and Pingdom @@ -121,6 +121,20 @@ It **is** the Hammer's job to ensure that all queries are handled, and that PRs Working on tickets in the backlog when you're the Hammer is not advised. The constant context switching makes it hard to get significant work done, and there is also the risk that questions go unanswered and PRs get blocked waiting for review because you're head down in a problem and don't notice them. +### Support Tickets + +Support tickets are created by users of Cloud Platform for various reasons. These can be anything from +- a request for help with a technical problem, +- a request for a new feature or service +- setting up Alertmanager Receiver +- setting up pingdom integration + +Support tickets are triaged by support squad. If the support ticket is a quick change e.g. for setting Alertmanager receiver, the ticket is assigned to +a member of support team and should be finished in a day or two. If the ticket involves some investigation work, then this can be assigned to support squad member in the same sprint or discussed in +backlog refinement and added to the following sprint. + +When working on support ticket, ensure that the ticket is updated with the progress and the user is informed. + ## Documentation Most of our user-facing documentation is in the [user guide], and documentation for the team is in the [runbooks] site. From 360d5d591355632900557fdde8cb786dbc58f1d1 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Thu, 6 Jun 2024 17:29:14 +0000 Subject: [PATCH 4/9] Commit changes made by code formatters --- runbooks/source/how-we-work.html.md.erb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/runbooks/source/how-we-work.html.md.erb b/runbooks/source/how-we-work.html.md.erb index eb93b00e..5f787e83 100644 --- a/runbooks/source/how-we-work.html.md.erb +++ b/runbooks/source/how-we-work.html.md.erb @@ -123,14 +123,14 @@ and there is also the risk that questions go unanswered and PRs get blocked wait ### Support Tickets -Support tickets are created by users of Cloud Platform for various reasons. These can be anything from -- a request for help with a technical problem, +Support tickets are created by users of Cloud Platform for various reasons. These can be anything from +- a request for help with a technical problem, - a request for a new feature or service - setting up Alertmanager Receiver - setting up pingdom integration -Support tickets are triaged by support squad. If the support ticket is a quick change e.g. for setting Alertmanager receiver, the ticket is assigned to -a member of support team and should be finished in a day or two. If the ticket involves some investigation work, then this can be assigned to support squad member in the same sprint or discussed in +Support tickets are triaged by support squad. If the support ticket is a quick change e.g. for setting Alertmanager receiver, the ticket is assigned to +a member of support team and should be finished in a day or two. If the ticket involves some investigation work, then this can be assigned to support squad member in the same sprint or discussed in backlog refinement and added to the following sprint. When working on support ticket, ensure that the ticket is updated with the progress and the user is informed. From 2bb86019faede92fc3066ddc03c8693eae2b703a Mon Sep 17 00:00:00 2001 From: Poornima Krishnasamy Date: Fri, 7 Jun 2024 11:40:50 +0100 Subject: [PATCH 5/9] Update runbooks/source/how-we-work.html.md.erb Co-authored-by: Steve Williams <105657964+sj-williams@users.noreply.github.com> --- runbooks/source/how-we-work.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runbooks/source/how-we-work.html.md.erb b/runbooks/source/how-we-work.html.md.erb index 5f787e83..55ffe12c 100644 --- a/runbooks/source/how-we-work.html.md.erb +++ b/runbooks/source/how-we-work.html.md.erb @@ -89,7 +89,7 @@ We have a support squad to manage the support requests and alerts that come in. - Ensuring queries in the `#ask-cloud-platform` slack channel are answered - Reviewing PRs raised by users of the Cloud Platform against the [environments repository] - Acknowledging and responding to alerts in the `#lower-priority-alarms` slack channel which include - - Alerts related to Platform - lower priority alarms which is triggered from Prometheus, AWS, and Pingdom + - Alerts related to the platform - lower priority alarms which are triggered from Prometheus, AWS, and Pingdom - Alerts from concourse pipelines related to Integration tests, infrastructure and divergence - Alerts from concourse pipelines related to [environments repository] i.e apply-namespace, apply-live - Any other alerts from concourse pipelines From 8e9dd15aa172ad71606d0fe2ad12749d049c7fee Mon Sep 17 00:00:00 2001 From: Poornima Krishnasamy Date: Fri, 7 Jun 2024 11:41:05 +0100 Subject: [PATCH 6/9] Update runbooks/source/how-we-work.html.md.erb Co-authored-by: Steve Williams <105657964+sj-williams@users.noreply.github.com> --- runbooks/source/how-we-work.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runbooks/source/how-we-work.html.md.erb b/runbooks/source/how-we-work.html.md.erb index 55ffe12c..6209aa7c 100644 --- a/runbooks/source/how-we-work.html.md.erb +++ b/runbooks/source/how-we-work.html.md.erb @@ -94,7 +94,7 @@ We have a support squad to manage the support requests and alerts that come in. - Alerts from concourse pipelines related to [environments repository] i.e apply-namespace, apply-live - Any other alerts from concourse pipelines - Support tickets raised by users -- Actions from the How out of date are we? report i.e. (e.g. reviewing documentation pages, or (carefully) destroying orphaned AWS resources) +- Actions from the How out of date are we? report i.e. (e.g. reviewing documentation pages, or __carefully__ destroying orphaned AWS resources) - Dependabot PRs on the cloud-platform-* repositories - Any issues from [link checker report] From ac72c70270ca521a6e2b5434ad38e81e9fe55cae Mon Sep 17 00:00:00 2001 From: Poornima Krishnasamy Date: Fri, 7 Jun 2024 11:41:16 +0100 Subject: [PATCH 7/9] Update runbooks/source/how-we-work.html.md.erb Co-authored-by: Steve Williams <105657964+sj-williams@users.noreply.github.com> --- runbooks/source/how-we-work.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runbooks/source/how-we-work.html.md.erb b/runbooks/source/how-we-work.html.md.erb index 6209aa7c..b3565595 100644 --- a/runbooks/source/how-we-work.html.md.erb +++ b/runbooks/source/how-we-work.html.md.erb @@ -130,7 +130,7 @@ Support tickets are created by users of Cloud Platform for various reasons. Thes - setting up pingdom integration Support tickets are triaged by support squad. If the support ticket is a quick change e.g. for setting Alertmanager receiver, the ticket is assigned to -a member of support team and should be finished in a day or two. If the ticket involves some investigation work, then this can be assigned to support squad member in the same sprint or discussed in +a member of the support team and should be finished in a day or two. If the ticket involves some investigation work, then this can be assigned to a support squad member in the same sprint, or discussed in backlog refinement and added to the following sprint. When working on support ticket, ensure that the ticket is updated with the progress and the user is informed. From cf42f44c3c5d091b74616a120ccfdac283d8c4de Mon Sep 17 00:00:00 2001 From: Poornima Krishnasamy Date: Fri, 7 Jun 2024 11:41:25 +0100 Subject: [PATCH 8/9] Update runbooks/source/how-we-work.html.md.erb Co-authored-by: Steve Williams <105657964+sj-williams@users.noreply.github.com> --- runbooks/source/how-we-work.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runbooks/source/how-we-work.html.md.erb b/runbooks/source/how-we-work.html.md.erb index b3565595..a5bdd8ed 100644 --- a/runbooks/source/how-we-work.html.md.erb +++ b/runbooks/source/how-we-work.html.md.erb @@ -129,7 +129,7 @@ Support tickets are created by users of Cloud Platform for various reasons. Thes - setting up Alertmanager Receiver - setting up pingdom integration -Support tickets are triaged by support squad. If the support ticket is a quick change e.g. for setting Alertmanager receiver, the ticket is assigned to +Support tickets are triaged by the support squad. If the support ticket is a quick change e.g. for setting an Alertmanager receiver, the ticket should be assigned to a member of the support team and should be finished in a day or two. If the ticket involves some investigation work, then this can be assigned to a support squad member in the same sprint, or discussed in backlog refinement and added to the following sprint. From 63a0a17619ded82b08f106efd62270ea8aed671d Mon Sep 17 00:00:00 2001 From: Poornima Krishnasamy Date: Fri, 7 Jun 2024 11:41:32 +0100 Subject: [PATCH 9/9] Update runbooks/source/how-we-work.html.md.erb Co-authored-by: Steve Williams <105657964+sj-williams@users.noreply.github.com> --- runbooks/source/how-we-work.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runbooks/source/how-we-work.html.md.erb b/runbooks/source/how-we-work.html.md.erb index a5bdd8ed..da621083 100644 --- a/runbooks/source/how-we-work.html.md.erb +++ b/runbooks/source/how-we-work.html.md.erb @@ -95,7 +95,7 @@ We have a support squad to manage the support requests and alerts that come in. - Any other alerts from concourse pipelines - Support tickets raised by users - Actions from the How out of date are we? report i.e. (e.g. reviewing documentation pages, or __carefully__ destroying orphaned AWS resources) -- Dependabot PRs on the cloud-platform-* repositories +- Open Dependabot PRs raised against the `cloud-platform` repositories, which are managed in our GitHub Project [here](https://github.com/orgs/ministryofjustice/projects/65/views/16) - Any issues from [link checker report] ### The 🔨 Hammer of Justice