Only wait for essential pods in cert recovery

The certificate recovery role will trigger a restart of every pod in the k8s cluster so that they can be updated with the latest certificate information. After pods restart the procedure waits every pod to recover and become READY. This change modifies that behaviour to only wait for essential pods to recover, being those in the core namespaces armada, cert-manager, flux-helm and kube-system. Test case: PASS: Run certificate recovery with crashing pods in a custom namespace Closes-Bug: 2058751 Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com> Change-Id: I3ea403a3e324ecbb5f2c1f56d6ce1c8bd80fabee
2024-03-15 11:40:26 -03:00 · 2024-03-15 11:40:26 -03:00 · 5a304af6e1
parent 3ac6db5973
commit 5a304af6e1
1 changed files with 2 additions and 1 deletions
--- a/playbookconfig/src/playbooks/roles/common/recover-subcloud-certificates/tasks/recover-k8s-leaf-certificates.yml
+++ b/playbookconfig/src/playbooks/roles/common/recover-subcloud-certificates/tasks/recover-k8s-leaf-certificates.yml
@ -81,8 +81,9 @@
  - name: Wait pods to restart (become READY) on controller
    shell: >-
      kubectl get po -l '!job-name' -A --no-headers -o
-      'custom-columns=NAME:.metadata.name,
+      'custom-columns=NAME:.metadata.name, NAMESPACE:.metadata.namespace,
      READY:.status.containerStatuses[*].ready,NODE:.spec.nodeName'
+      | grep "armada\|cert-manager\|flux-helm\|kube-system"
      | grep -v calico-node
      | grep $(hostname)
      | grep -cv true