2023-11-30

2023-11-30 - proxmox rolling upgrade script

proxmox rolling upgrade script

This is a relatively simple/hacky script I wrote to perform an upgrade and a rolling reboot across a Proxmox cluster.

It works on my system, but probably needs a little more attention before anyone else should put it into any sort of production cluster.

Frankly, I also should rewrite it in a higher-level programming language with saner error handling. Or at least some error handling! That’s probably the worst thing about Bash.

Anyway, here’s the script:

#!/usr/bin/env bash
# update-nodes/roll.sh
set -euxo pipefail

DO_SSH(){
  NODE=$1
  SSH_USER=$2
  shift
  shift
  SSH_COMMAND=('ssh' "${NODE}" '-l' "${SSH_USER}" 'sudo' "${@}")
  "${SSH_COMMAND[@]}"
}

# get list of nodes
GET_NODES(){
  #NODES=(192.168.12.5{1.})
  API_METHOD='get'
  API_PATH='/nodes'
  API_CALL=("${API_METHOD}" "${API_PATH}")
  PVESH_QUERY=('pvesh' "${API_CALL[@]}" '--output-format' 'json')
  JQ_QUERY='.[].node'
  QUERY_NODE=n3
  # NODES=($(ssh 192.168.12.51 sudo pvecm nodes | tail -n+5 | tr -s '  ' | cut -d\  -f4))
  mapfile -t NODES < <(DO_SSH "${QUERY_NODE}" "${USER}" "${PVESH_QUERY[@]}" | jq -r "${JQ_QUERY}")
  export NODES
}

ROLL(){
  #!/bin/bash
  # roll.sh
  set -euxo pipefail
  NODE=$1
  DO_SSH "${NODE}" "${USER}" reboot

  DO_SSH "${NODE}" "${USER}" dmesg -w # wait on dmesg logs until reboot begins
  until DO_SSH "${NODE}" "${USER}" whoami # then loop until we can authenticate again
    do
      sleep 1s
    done
}

UPDATE_NODES(){
  for NODE in "${NODES[@]}"
  do
    DO_SSH "${NODE}" "${USER}" sudo apt-get update
    DO_SSH "${NODE}" "${USER}" sudo apt-get dist-upgrade -y --autoremove
  done
}

ROLL_NODES(){
  for NODE in "${NODES[@]}"
  do
    echo "REBOOTING NODE ${NODE} IN 5 SECONDS!"
    sleep 5s
    ROLL "${NODE}"
  done

}

main(){
  GET_NODES
  UPDATE_NODES
  ROLL_NODES
}

main

previous version

This is the reboot function from a basic script I use to perform a rolling reboot of a cluster. The way you’d use it is by looping through a list of nodes, and calling the script like ./roll.sh ${NODE}.

#!/bin/bash
# roll.sh
set -euxo pipefail
NODE=$1
ssh ${NODE} reboot
# the following blocks until the reboot begins, and then loops until we can authenticate
ssh ${NODE} dmesg -w || until ssh ${NODE} whoami 
do
        sleep 1s
done

If you can safely reboot more than one node at a time in your cluster, you can use threading to call the script more than once. Exercise left to the reader, but I probably will eventually enhance this myself with a loop running up to MAX_OFFLINE_NODES threads at once.

If my content has been helpful to you, please consider buying me a coffee or following me on Github.

ibeep.com © 2023 by bri recchia is licensed under CC BY-SA 4.0
My code is licensed under the MIT license unless otherwise stated.
Alternative terms may be available upon request.