-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy filestore backups to all GCP clusters #4683
Comments
Me and @sgibson91 have decided to defer this to october based on current engineering capacity |
List of GCP clusters to enable
|
Trying to deploy this to the 2i2c-uk cluster generated a lot of weird errors from helm:
|
Ok, this error is cropping up for awi-ciroh too. So I suspect that it will be for all of them. |
Aha! This line is present in 2i2c cluster's support config, but not in the docs:
Adding that creates a successful rollout of helm, but the pod is now erroring. I will fix the docs and investigate the crashing pod. |
Cause of the pod erroring:
It has the correct service account, but that maybe does not have the correct permissions. But it should if the terraform is correct. |
Fixed! I defined the zone incorrectly in config. |
All GCP clusters now have gcpFilestoreBackups enabled with a retention period of 2 days. I will monitor over the next couple of days to check i) backups are being created, ii) the correct number of backups are being retained |
I have verified via the GCP console that all clusters have had a backup created. I will check again on Monday that we are only retaining 2 days' worth of backups. |
The backups are not being cleaned out. On 2i2c-uk, we have 4 backups instead of 2. Investigating. |
I opened 2i2c-org/gcp-filestore-backups#26 that I believe will help track the root cause and will pick that up as unplanned next |
Running describe on the deployment, I see the retention period variable is not being passed to the script via helm config, despite being set
I'm going to tweak some things, make a new release of the chart, and see if that resolves the issue |
What's going on here is that the helm templating is mangling all of my input arguments to the python script, causing errors because the right values are not paired with the right parameters. I'll continue battling. |
Got down to this being the error, but I know I've fixed that typo. For some reason the latest helm chart release hasn't picked it up yet. Tomorrow, I'll force another release. |
To get here, I had to get rid of support for backing up multiple filestores and drastically simplify the helm templating and argument input into the python script. Since I imagine we will want to migrate everyone to per-user storage solution when that is ready, I don't think the loss of this feature is that critical. |
Forcing a new release of the helm chart worked! |
All the projects now only have two backups 🎉 |
Enable GCP filestore backups project on all GCP clusters, keeping backups for a maximum of 2 days to minimise costs. Documentation for enabling this feature is here: https://infrastructure.2i2c.org/howto/filesystem-management/filesystem-backups/enable-backups/#gcp
Definition of Done
gcpFilestoreBackups
is enabled on all GCP clusters (see list in comment below)The text was updated successfully, but these errors were encountered: