-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only update cluster-ca-key-generation from CaReconciler #10749
base: main
Are you sure you want to change the base?
Conversation
When rolling pods to trust new CA cert signed by new CA key only update the cluster-ca-key-generation annotation. Do not update cluster-ca-cert-generation or clients-ca-cert-generation. Signed-off-by: Katherine Stanley <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Regression tests should be run once the CI queues are empty.
Skip restarting pods that have an up-to-date CA key annotation if no other restart reasons present. Signed-off-by: Katherine Stanley <[email protected]>
LOGGER.debugCr(reconciliation, "Rolling Pod {} due to {}", pod.getMetadata().getName(), podRollReasons.getReasons()); | ||
return podRollReasons; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to wrap this into else
branch to have return validation.
@@ -575,6 +575,15 @@ Future<Void> rollingUpdateForNewCaKey() { | |||
false, | |||
eventPublisher | |||
).rollingRestart(pod -> { | |||
if (podRollReasons.getReasons().size() == 1 && podRollReasons.contains(RestartReason.CLUSTER_CA_CERT_KEY_REPLACED)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Why do we even need the rolling update here for Clients CA key replacement?
- It is not used internally, so we do not need to establish trust
- It should be enough to roll this in the regular rolling update where it would anyway roll to change the annotation, or?
I'm investigating the failure, I've worked out what's causing it, just trying to determine how best to address it |
* Add explicit else clauses. * Update KubernetesRestartEvent tests to use the force-replace annotation to trigger CA key replacement. Signed-off-by: Katherine Stanley <[email protected]>
43d8730
to
b7c43ba
Compare
if (podRollReasons.getReasons().size() == 1 && podRollReasons.contains(RestartReason.CLUSTER_CA_CERT_KEY_REPLACED)) { | ||
// Check if this pod has been rolled previously | ||
int clusterCaKeyGeneration = clusterCa.keyGeneration(); | ||
int podClusterCaKeyGeneration = Annotations.intAnnotation(pod, Ca.ANNO_STRIMZI_IO_CLUSTER_CA_KEY_GENERATION, clusterCaKeyGeneration); | ||
if (clusterCaKeyGeneration == podClusterCaKeyGeneration) { | ||
LOGGER.debugCr(reconciliation, "Not rolling Pod {} since the CA cert key generation is correct.", pod.getMetadata().getName()); | ||
return RestartReasons.empty(); | ||
} else { | ||
LOGGER.debugCr(reconciliation, "Rolling Pod {} due to {}", pod.getMetadata().getName(), podRollReasons.getReasons()); | ||
return podRollReasons; | ||
} | ||
} else { | ||
LOGGER.debugCr(reconciliation, "Rolling Pod {} due to {}", pod.getMetadata().getName(), podRollReasons.getReasons()); | ||
return podRollReasons; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, does this need to be updated to handle both reasons = both annotations? Otherwise, why are we patching the clients CA generation in the PodSet?
Signed-off-by: Katherine Stanley <[email protected]>
10b951e
to
ee009e1
Compare
Update the CaReconciler to only roll pods when the cluster CA key is replaced, not when the clients CA key is replaced Signed-off-by: Katherine Stanley <[email protected]>
ee009e1
to
65debe5
Compare
@scholzj @ppatierno I've pushed some updates to only roll when the cluster CA key is replaced, not the clients CA key (as we can leave that to the KafkaRoller) and only use the annotations, not the |
/** | ||
* Clients CA private key was replaced | ||
*/ | ||
CLIENT_CA_CERT_KEY_REPLACED("Trust new clients CA certificate signed by new key"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this should be removed from docs as well (search for ClientCaCertKeyReplaced
).
|
||
// cluster CA needs to be fully trusted | ||
// it is coming from a cluster CA key replacement which didn't end successfully (i.e. CO stopped) and we need to continue from there | ||
if (clusterCa.keyReplaced() || isClusterCaNeedFullTrust) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we are 100% sure the clusterCa.keyReplaced()
is not needed :-o
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run zookeeper-regression |
Azure Pipelines successfully started running 1 pipeline(s). |
@katheris Looks like the tests related to CA renewal failed: |
Type of change
Select the type of your PR
Description
When rolling pods to trust new CA cert signed by new CA key only update the cluster-ca-key-generation
annotation. Do not update cluster-ca-cert-generation or clients-ca-cert-generation.
In PR #10711 I added logic to update the cluster-ca-cert-gen , clients-ca-cert-gen and cluster-ca-key-gen annotations before rolling the pods. However, the pods are being rolled to trust a new CA cert that is signed by a new CA key. They are still presenting certificates that were issued by the old CA cert.
This means that if the reconcile fails after rolling the Kafka pods as part of the CaReconciler rolling update, but before it runs the KafkaReconciler logic, when it comes back in it would incorrectly think that the new CA cert was used everywhere and remove the old one from the ca-cert Secret.
We should only update the cluster-ca-key-gen annotation before doing the rolling update
Checklist
Please go through this checklist and make sure all applicable tasks have been done