-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoscale vtgate replicas using HPA #3945
Conversation
5298ae5
to
7288c60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll also probably want to add a schema for the vitessclusters files at some point (https://github.com/Yelp/paasta/tree/master/paasta_tools/cli/schemas)
if crd.file_prefix in INSTANCE_TYPE_TO_RELATED_OBJECTS_UPDATER: | ||
INSTANCE_TYPE_TO_RELATED_OBJECTS_UPDATER[crd.file_prefix]( | ||
service=service, | ||
instance=inst, | ||
cluster=cluster, | ||
kube_client=kube_client, | ||
soa_dir=DEFAULT_SOA_DIR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd want a log message right before this as in
paasta/paasta_tools/setup_kubernetes_job.py
Line 323 in d078a30
log.info(f"Ensuring related API objects for {app} are in sync") |
I think I also need to take a closer look at what {setup,cleanup}_kubernetes_cr()
currently do and see if there's anything we need to worry about there re: a CRD that's not created with create_custom_resource()
@EvanKrall @Rob-Johnson I'm also a little unsure about what the best way to do this is - I'm not sure if we'd want to have this sort of conditional logic based on the instance type or if we'd want to to something at the base class that all these instance types use so that updating CRs that aren't directly tied to a CRD that paasta discovers
soa_dir: str = DEFAULT_SOA_DIR, | ||
) -> VitessDeploymentConfigDict: | ||
vitess_service_instance_configs = load_vitess_instance_config( | ||
service, instance, cluster, soa_dir=soa_dir | ||
).get_vitess_config() | ||
).get_vitess_config(kube_client=kube_client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik, we want to avoid requiring kubernetes permissions/clients to load configs - folks without k8s admin permissions can write scripts that read all instance configs, and this would break such scripts
(also, in general, creating the config dicts should be doable without talking to kubernetes as it should be a pretty pure operation (i.e., the only input should be soaconfigs and there should be no mutations or anything))
for cell in vitess_config["cells"]: | ||
if not self.is_vtgate_autoscaling_enabled(cell): | ||
continue | ||
|
||
name = self.get_vtgate_hpa_name(cell["name"]) | ||
scalablevtgate_cr = None | ||
try: | ||
scalablevtgate_cr = kube_client.custom.get_namespaced_custom_object( | ||
group=SCALABLEVTGATE_CRD["group"], | ||
version=SCALABLEVTGATE_CRD["version"], | ||
namespace=self.get_namespace(), | ||
plural=SCALABLEVTGATE_CRD["plural"], | ||
name=name, | ||
) | ||
except ApiException as e: | ||
if e.status != 404: | ||
raise e | ||
else: | ||
log.error(f"{SCALABLEVTGATE_CRD['kind']} {name} not found") | ||
|
||
if not scalablevtgate_cr: | ||
log.error(f"{SCALABLEVTGATE_CRD['kind']} {name} not found") | ||
continue | ||
|
||
autoscaled_replicas = scalablevtgate_cr["spec"].get("replicas") | ||
if autoscaled_replicas is not None: | ||
cell["gateway"]["replicas"] = autoscaled_replicas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i'm not quite sure where we wanna put this right now such that we can still construct the instance config in the absence of a kube client, but still run this logic when actually talking to k8s to create resources
return | ||
|
||
def get_vtgate_hpa_name(self, cell_name: str) -> str: | ||
return f"{sanitised_cr_name(self.service, self.instance)}-cell-{cell_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to sanitize cell_name with sanitise_kubernetes_name()
? additionally, how long can these strings be? do we need to truncate them to ensure that we stay below the k8s length limits?
> 0 | ||
) | ||
|
||
if should_exist: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want to add some log messages here to make it clear when/what changes are happening
], | ||
}, | ||
"spec": { | ||
"selector": ",".join([f"{k}={v}" for k, v in pod_selector.items()]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note: if any of the label values these selectors are filtering on are run through sanitise_kubernetes_name()
, we'll need to ensure that the values in the selector dict are also passed to sanitise_kubernetes_name()
Problem or Feature
Note: This PR will be unnecessary if the following one is merged: planetscale/vitess-operator#598
We want to auto-scale vtgate using HPA. However, there are 2 problems:
Solution
Therefore, this PR is doing the following:
Context