Adds a new CLUO operator and agent workloads for platform nodes (#226)

* Renames update-operator to update-operator-master since we want to have CLOU for platform nodes too. * Renames update-operator.jsonnet to update-operator-master.jsonnet since we need another operator for the platform too. * Adds a CLUO agent daemonset for platform nodes. * Renames update-agent.jsonnet to include a 'master' modifier to distinguish it from the agent file for platform nodes. * Updates system.jsonnet to find renamed update-operator workloads and the new update-operator workloads for platform nodes. * Adds a CLUO deployment for platform nodes. * Schedules the master cluster CLUO operator on the prometheus server cloud node. Previously it was on a master node, like a snake eating its tail. * Since the master cluster CLUO operator is on the Prometheus server, schedule the platform cluster CLUO operator there too, for consistency. * Adds -before-reboot-annotations to the CLUO operator deployments for both master and platform nodes. * Updates the -before-reboot-annotations flag values. * Adds an annotation to the master node so that CLUO for the master cluster can identify master nodes. * Adds the 'patch' verb to the things that the update operator role can do to nodes. * Renames the role file for CLUO to be simpler and shorter. * Adds a new template function CluoAnnotation(), and runs that function from each update-agent DaemonSet. * Fixes a number of syntax and format errors. * Fixes typo 'alpine:lastest' to use 'latest'. * Install curl and adds line continuation chars at end of each line of the CluoAnnotation initContainer. * Moves apk-update and apk-add to first line. * Uses the reboot-coordinator service account explicity rather than defining it and instead using the default namespace SA. * Attempts to use initContainer to write a node annotation script to a volume which the update-agent container will mount and runn before running the agent. * Adds a missing comma to update-agent-master.jsonnet. * Fixes the syntax and spelling of the volumeMounts sections of the initContainers. * Moves update-operator node annotation to a configmap. * Removes old reference to CluoAnnotation in templates.jsonnet. * Adds new configs for the update-operator. * Adds the update-operator ConfigMap to the reboot-coordinator namespace. * Changes the format of 'command' for update-agents to something that will hopefully work. * Runs annotate-node.sh with sh, since is isn't executable. * Adds sinqle quotes around patch JSON. * Uses double quotes instead of single quotes for JSON patch to avoid conflict with enclosing single quotes. * Wraps curl header fields in single quotes. * Removes single quotes from curl arguments. * Adds single quotes around JSON patch. * Removes single quote wrapping from JSON patch. * Changes update operator annotation from mlab-type-<type> to mlab-type/<type> for better readability. * Remove the annotation prefix because it didn't really comply with the prefix specification, which is suspposed to be a domain name-style identifier. * Goes back to using all dashes for the update operator node nnotations, because I just didn't like the underscore. * Removes the 'master' update opeartor and agent, since we will now just use update-operator for rolling reboots of platform nodes only. * Renames the 'platform' update operator and agent to just update-operator and update-agent, since we will now only run a single operator and agent for platform nodes only. * Removes the -master and -platform suffixes from update operator and agent DaemonSet and Deployment, and changes the --before-reboot-annotation to 'mlab-reboot-ok'. * Sets the update-operator to run on a master node again, now that we aren't using it to reboot master nodes. * Removes the special update-operator annotation from the master nodes, now that master nodes won't be rebooted by update-operator. * Adds a new reboot-node.service, along with a Timer to execute it once a day, as well as writing the file to be executed. * Uses the short weekday name instead of the long one. * Configures a 'reboot day' for each of the master nodes. Each one will reboot on a different day of the week. * Updates system.jsonnet to account for the fact that only a single update-operator and agent now exist. * Do not enable or start the reboot-node.service, as this will cause a reboot loop. The only thing that should run this service is its associated Timer. * Renames allocate_new_cloud_node.sh to something more intuitive for the platform cluster. * Calls newly renamed add_platform_cluster_cloud_node.sh. * Removes the facility for annotation a node for the update-operator, since that annotation will have to be applied by an operator or some operator script. * Expands a comment about tolerations. * Adds in an additional safety check in the reboot-node.service script such that a reboot will not occur if the etcd cluster does not have exactly 3 members. * Restructures reboot-node script with slightly easier to follow logic.
m-lab · Aug 5, 2019 · 0702299 · 0702299
1 parent 0672156
commit 0702299
Show file tree

Hide file tree

Showing 10 changed files with 69 additions and 13 deletions.
diff --git a/k8s/daemonsets/core/update-agent.jsonnet b/k8s/daemonsets/core/update-agent.jsonnet
@@ -65,13 +65,14 @@
           },
         ],
         nodeSelector: {
-          'node-role.kubernetes.io/master': '',
+          'mlab/type': 'platform',
         },
+        serviceAccountName: 'reboot-coordinator',
+        // This is a pod that should be scheduled under every possible
+        // circumstance, so tolerate everything.
         tolerations: [
           {
-            effect: 'NoSchedule',
-            key: 'node-role.kubernetes.io/master',
-            operator: 'Exists',
+            operator: 'Exists'
           },
         ],
         volumes: [
@@ -104,7 +105,7 @@
     },
     updateStrategy: {
       rollingUpdate: {
-        maxUnavailable: 1,
+        maxUnavailable: 2,
       },
       type: 'RollingUpdate',
     },

diff --git a/k8s/deployments/update-operator.jsonnet b/k8s/deployments/update-operator.jsonnet
@@ -26,8 +26,7 @@
         containers: [
           {
             args: [
-              '-reboot-window-start=Tue 15:00',
-              '-reboot-window-length=2h',
+              '-before-reboot-annotations=mlab-reboot-ok',
             ],
             command: [
               '/bin/update-operator',
@@ -49,10 +48,9 @@
         nodeSelector: {
           'node-role.kubernetes.io/master': '',
         },
+        serviceAccountName: 'reboot-coordinator',
         tolerations: [
           {
-            effect: 'NoSchedule',
-            key: 'node-role.kubernetes.io/master',
             operator: 'Exists',
           },
         ],

diff --git a/...ontainer-linux-update-coordinator.jsonnet → k8s/roles/update-operator.jsonnet b/...ontainer-linux-update-coordinator.jsonnet → k8s/roles/update-operator.jsonnet
@@ -25,6 +25,7 @@
         verbs: [
           'get',
           'list',
+          'patch',
           'watch',
           'update',
         ],
@@ -96,7 +97,7 @@
     subjects: [
       {
         kind: 'ServiceAccount',
-        name: 'default',
+        name: 'reboot-coordinator',
         namespace: 'reboot-coordinator',
       },
     ],

diff --git a/manage-cluster/allocate_new_cloud_node.sh → ...luster/add_platform_cluster_cloud_node.sh b/manage-cluster/allocate_new_cloud_node.sh → ...luster/add_platform_cluster_cloud_node.sh
diff --git a/manage-cluster/bootstrap_platform_cluster.sh b/manage-cluster/bootstrap_platform_cluster.sh
@@ -522,6 +522,8 @@ gcloud compute firewall-rules create ${GCE_BASE_NAME}-internal \
 #
 ETCD_CLUSTER_STATE="new"
 
+idx=0
 for zone in $GCE_ZONES; do
-  create_master $zone
+  create_master $zone ${REBOOT_DAYS[$idx]}
+  idx=$(( idx + 1 ))
 done
diff --git a/manage-cluster/bootstrap_prometheus.sh b/manage-cluster/bootstrap_prometheus.sh
@@ -129,7 +129,7 @@ fi
 #######################################################
 
 # Create the new node
-./allocate_new_cloud_node.sh -p "${PROJECT}" \
+./add_platform_cluster_cloud_node.sh -p "${PROJECT}" \
     -m "${MACHINE_TYPE}" \
     -n "${PROM_BASE_NAME}" \
     -a "${PROM_BASE_NAME}" \

diff --git a/manage-cluster/bootstraplib.sh b/manage-cluster/bootstraplib.sh
@@ -3,6 +3,7 @@
 
 function create_master {
   local zone=$1
+  local reboot_day=$2
 
   gce_zone="${GCE_REGION}-${zone}"
   gce_name="master-${GCE_BASE_NAME}-${gce_zone}"
@@ -156,6 +157,11 @@ function create_master {
     # Binaries will get installed in /opt/bin, put it in root's PATH
     echo "export PATH=$PATH:/opt/bin" >> /root/.bashrc
 
+    # Write out the reboot day to a file in /etc. The reboot-node.service
+    # systemd unit will read the contents of this file to determine when to
+    # reboot the node.
+    echo -n "${reboot_day}" > /etc/reboot-node-day
+
     # Install CNI plugins.
     mkdir -p /opt/cni/bin
     curl -L "https://github.com/containernetworking/plugins/releases/download/${K8S_CNI_VERSION}/cni-plugins-amd64-${K8S_CNI_VERSION}.tgz" | tar -C /opt/cni/bin -xz

diff --git a/manage-cluster/cloud-config_master.yml b/manage-cluster/cloud-config_master.yml
@@ -75,6 +75,28 @@ coreos:
         [Install]
         WantedBy=multi-user.target
 
+    - name: reboot-node.service
+      content: |
+        [Unit]
+        Description=reboot-node.service
+
+        [Service]
+        Type=oneshot
+        ExecStart=/opt/bin/reboot-node
+
+    - name: reboot-node.timer
+      enable: "true"
+      command: "start"
+      content: |
+        [Unit]
+        Description=Run reboot-node.service daily
+
+        [Timer]
+        OnCalendar=Mon..Fri 15:00:00
+
+        [Install]
+        WantedBy=multi-user.target
+
 write_files:
   - path: /etc/ssh/sshd_config
     permissions: 0600
@@ -103,3 +125,23 @@ write_files:
     content: |
       fs.inotify.max_user_watches=131072
 
+  # The smallest of scripts to reboot the machine.
+  - path: /opt/bin/reboot-node
+    permissions: 0744
+    owner: root:root
+    content: |
+      #!/bin/bash
+      REBOOT_DAY=$(cat /etc/reboot-node-day)
+      TODAY=$(date +%a)
+      ETCD_MEMBERS=$(/usr/bin/etcdctl member list | wc -l)
+      if [[ "${REBOOT_DAY}" != "${TODAY}" ]]; then
+        echo "Reboot day ${REBOOT_DAY} doesn't equal today: ${TODAY}. Not rebooting."
+        exit 0
+      fi
+      if [[ "${ETCD_MEMBERS}" -lt "3" ]]; then
+        echo "There are less than 3 etcd cluster members. Not rebooting."
+        exit 1
+      fi
+      echo "Reboot day ${REBOOT_DAY} equals today: ${TODAY}. Rebooting node."
+      /usr/sbin/reboot
+
diff --git a/manage-cluster/k8s_deploy.conf b/manage-cluster/k8s_deploy.conf
@@ -61,6 +61,12 @@ GCE_ZONES_mlab_oti="b c d"
 GCS_BUCKET_EPOXY_mlab_oti="epoxy-mlab-oti"
 GCS_BUCKET_K8S_mlab_oti="k8s-support-mlab-oti"
 
+# The days on which the master nodes will be rebooted automatically. The days
+# map to three GCE_ZONES defined for each project. That is, the first day in
+# the below array will apply to the first GCE_ZONE defined for the project, and
+# so on.
+REBOOT_DAYS=(Tue Wed Thu)
+
 # Whether the script should exit after deleting all existing GCP resources
 # associated with creating this k8s cluster. This could be useful, for example,
 # if you want to change various object names, but don't want to have to

diff --git a/system.jsonnet b/system.jsonnet
@@ -30,7 +30,7 @@
     // Networks (which are in array form already).
     import 'k8s/networks/networks.jsonnet',
     // Roles (which are in array form already).
-    import 'k8s/roles/container-linux-update-coordinator.jsonnet',
+    import 'k8s/roles/update-operator.jsonnet',
     import 'k8s/roles/flannel.jsonnet',
     import 'k8s/roles/kube-rbac-proxy.jsonnet',
     import 'k8s/roles/kube-state-metrics.jsonnet',