Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize Uptime of Editor Experience #19397

Open
14 tasks
gracekretschmer-metrostar opened this issue Oct 3, 2024 · 0 comments
Open
14 tasks

Stabilize Uptime of Editor Experience #19397

gracekretschmer-metrostar opened this issue Oct 3, 2024 · 0 comments
Assignees
Labels
CMS Team CMS Product team that manages both editor exp and devops Initiative Initiatives are collections of epics that drive toward a common goal defined within Crew's Objective

Comments

@gracekretschmer-metrostar
Copy link

gracekretschmer-metrostar commented Oct 3, 2024

Status

Update each sprint until completed

Date Status Launch Date (see above) Notes
10/29/2024 in-progress on-track working on roll-back job now and building up a support rotation

Problem Statement

There have been 4 times (source) over the last 3 months (from July - September 2024) that the editor experience (prod CMS) has been unavailable. Therefore, prod CMS' uptime fell to 99.786% in the last 3 months, which is below the industry standard of 99.9% uptime. To be in alignment with industry standard (99.9% uptime) and maintain trust with editors, we need to put in preventative measures to improve the uptime of the editor experience.

Business Outcomes / OKR

  • Objective: Our platforms are the best way to deliver products at VA
  • Key Result: Out platforms hit the "elite" level (as defined by DORA) on Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Time to Restore Service

Hypothesis or Bet

  • Add preventative measures to mitigate the missing IP addresses
  • Add and automate a rollback job into Jenkins that will better automate the needed rollback step when prod CMS goes offline.

We will know we're done when... ("Definition of Done")

  • The editor experience uptime increases to 99.9% over a three month period.
  • Measures are implemented that decrease the amount of time it takes to bring prod CMS back online when it goes offline.

Known Blockers/Dependencies

List any blockers or dependencies for this work to be completed

Projected Launch Date

End of Q4, 12/31/2024.

Launch Checklist

Guidance (delete before posting)

Is this service / tool / feature...

... tested?

  • Usability test (TODO: link) has been performed, to validate that new changes enable users to do what was intended and that these changes don't worsen quality elsewhere. If usability test isn't relevant for this change, document the reason for skipping it.
    • ... and issues discovered in usability testing have been addressed.
    • Note on skipping: metrics that show the impact of before/after can be a substitute for usability testing.
  • End-to-end manual QA or UAT is complete, to validate there are no high-severity issues before launching
  • (if applicable) New functionality has thorough, automated tests running in CI/CD

... documented?

... measurable

  • We will be measuring the uptime of prod CMS uptime to meet the goal of 99.9% uptime

When you're ready to launch...

Required Artifacts

Documentation

  • PRODUCT_NAME: directory name used for your product documentation
  • Product Outline: link to Product Outline
  • User Guide: link to User Guide

Testing

  • Usability test: link to GitHub issue, or provide reason for skipping
  • Manual QA: link to GitHub issue or documented results
  • Automated tests: link to tests, or "N/A"

Measurement

@gracekretschmer-metrostar gracekretschmer-metrostar added CMS Team CMS Product team that manages both editor exp and devops Initiative Initiatives are collections of epics that drive toward a common goal defined within Crew's Objective Needs refining Issue status labels Oct 3, 2024
@gracekretschmer-metrostar gracekretschmer-metrostar changed the title Stabilize Uptime of Editor Experience CMS - Stabilize Uptime of Editor Experience Oct 17, 2024
@gracekretschmer-metrostar gracekretschmer-metrostar changed the title CMS - Stabilize Uptime of Editor Experience Stabilize Uptime of Editor Experience Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMS Team CMS Product team that manages both editor exp and devops Initiative Initiatives are collections of epics that drive toward a common goal defined within Crew's Objective
Projects
None yet
Development

No branches or pull requests

1 participant