Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR for deployment slots #1289

Merged
merged 4 commits into from
Sep 4, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions adr/023-deployment-slots.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# 23. Deployment Slots for Zero Downtime Deploys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a reason why we are using deployment slots even though there are some significant negative impacts and risks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the other options were even worse - Kubernetes, roll our own, or switch to AWS (or abandon zero-downtime deploys). We can add a mention of the alternatives


Date: 2024-09-03

## Decision
We will use Azure Web App Deployment Slots to facilitate zero-downtime deploys of the TI app

## Status

Accepted.

## Context
Because TI is driven by web traffic from ReportStream, we can receive http calls at any time.
If TI fails to respond, ReportStream will have to try sending the data again later, causing delays.
By implementing zero-downtime deploys, our service can remain available to any incoming calls.

Even though there are some significant downsides to Deployment Slots, they're Azure's recommended
approach to zero-downtime deploys (ZDD), and they're lower effort and lower risk than the alternatives.
Other options to achieve ZDD are Kubernetes (significantly more complexity and effort), creating
our own custom deploy system (significantly more complexity, effort, and risk), or switching to
a cloud service provider that makes this easier, like AWS (not currently in scope as an option).

## Impact
### Positive
- **Zero-downtime deploys**: Zero-downtime deploys keep us from dropping incoming calls during deployment.
- **Easy rollback**: Deployment slots make it easy to roll back to the previous version of the
app if we find errors after deploy.
- **Consistency**: Deployment Slots are an Azure feature specifically designed to enable
zero-down time deployment. We use deployment slots in all TI environments and
in the SFTP Ingestion Service.

### Negative
- **Incomplete support for Linux**: The auto-swap feature is not available for Linux-based web apps like ours.
so we had to include an explicit swapping step in our updated deployment process.
- **Opaque responses from `az webapp deployment slot swap` CLI**: When there are issues swapping slots, the CLI doesn't
return any details about the issue. The swapping operation can also take as much as 20 minutes
to time out if there's a silent failure, which slows down deploy and validation.
- **Steep learning curve**: Most of the official docs and unofficial resources
(such as blogs and tutorials) for deployment slots are written for people using Windows
servers and Microsoft-published programing languages. This lack of support for other platforms
and languages means a lot more trial and error is involved.

### Risks
- Because of the incomplete support for and documentation of our usecase, we may not have
chosen the optimal implementation of this feature. It may also be time-consuming to
troubleshoot if we run into future issues.