Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding logic to Patch ETCD apiserver client crt secret on rollout #56

Merged
merged 1 commit into from
Dec 5, 2023

Conversation

pokearu
Copy link
Member

@pokearu pokearu commented Dec 1, 2023

Description of changes:
When ETCD machines are rolled out, we did not update the ETCD API Server client cert secret on the cluster. Namely

<cluster-name>-apiserver-etcd-client

Hence CAPI was not updating the etcd client cert on the CP nodes, which is found in path
/etc/kubernetes/pki/apiserver-etcd-client.crt
This prevented the certs on the CP from being renewed during node rollouts. This meant that once the certificates expired, the cluster would no longer be in a healthy state.

The changes modify the ETCD nodes rollout workflow to have a condition to patch the ETCD API server client cert secret on the cluster. This happens once after the new ETCD nodes have been rolled out and before the first CP node comes up.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@vignesh-goutham
Copy link
Member

LGTM, I'll wait for one more person to review this.

@pokearu pokearu force-pushed the etcd-client-crt-renew branch from 8d1370a to c16c9ec Compare December 4, 2023 19:29
@@ -87,7 +88,16 @@ func (r *EtcdadmClusterReconciler) generateCAandClientCertSecrets(ctx context.Co
Generated: true,
}
s := apiServerClientCertKeyPair.AsSecret(client.ObjectKey{Name: cluster.Name, Namespace: cluster.Namespace}, *metav1.NewControllerRef(etcdCluster, etcdv1.GroupVersion.WithKind("EtcdadmCluster")))
if err := r.Client.Create(ctx, s); err != nil && !apierrors.IsAlreadyExists(err) {
secretToPatch := s.DeepCopy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont quite understand why we need to have a deep copy of the secret

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically its for how CreateOrPatch works. If the object is found, its expected that the changes needed should be redone in a call back func.

So on line 98 s.Data = secretToPatch.Data i am updating the secret data again using the deep copied object.

// The default case is hit right after ScaleUp of ETCD nodes is completed and before the first CP comes up.
// If EtcdCertificatesAvailable is False, this means we need to update the certs.
// EtcdCertificatesAvailable is set to True once the certs are updated.
if conditions.Has(ep.EC, etcdv1.EtcdCertificatesAvailableCondition) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need conditions.Has(ep.EC, etcdv1.EtcdCertificatesAvailableCondition)? or conditions.IsFalse(ep.EC, etcdv1.EtcdCertificatesAvailableCondition) is sufficient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um that might work. I followed the pattern on top line 322 where we checked if conditions.Has(ep.EC, etcdv1.EtcdMachinesSpecUpToDateCondition) before setting to True.

But it might work with just IsFalse since it returns False if the condition is not found.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea it will return false if condition is nil so i dont think you need the Has check

if conditions.Has(ep.EC, etcdv1.EtcdCertificatesAvailableCondition) &&
conditions.IsFalse(ep.EC, etcdv1.EtcdCertificatesAvailableCondition) {
log.Info("Updating Etcd client certs")
if err := r.generateCAandClientCertSecrets(ctx, cluster, etcdCluster); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the actual external etcd CA cert + key pair are not regenerated after the etcd machines are rolled out right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the ETCD certs on the ETCD nodes do get updated but thats with etcdadm, its not related to what we do here.

That was working, the problem was only that the secret on the cluster was not updated and thereby the cert on the CP was not updated.

// But if an object is found, the function expects to reconcile the fields we want patched in a callback func.
// Hence we keep a copy of the newly generated secret and update the secret Data field in a callback func.
// Ex; https://github.com/kubernetes-sigs/controller-runtime/blob/v0.14.5/pkg/controller/controllerutil/example_test.go
if _, err := controllerutil.CreateOrPatch(ctx, r.Client, s, func() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is another secret create in

if err := r.Client.Create(ctx, s); err != nil && !apierrors.IsAlreadyExists(err) {

what does it do and why we do not need to make similar change like this api-server one?

@pokearu pokearu force-pushed the etcd-client-crt-renew branch from c16c9ec to 5faa2b2 Compare December 5, 2023 18:31
@jiayiwang7
Copy link
Member

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants