proxmox rolling upgrade script
This is a relatively simple/hacky script I wrote to perform an upgrade and a rolling reboot across a Proxmox cluster.
It works on my system, but probably needs a little more attention before anyone else should put it into any sort of production cluster.
Frankly, I also should rewrite it in a higher-level programming language with saner error handling. Or at least some error handling! That’s probably the worst thing about Bash.
Anyway, here’s the script:
#!/usr/bin/env bash
# update-nodes/roll.sh
set -euxo pipefail
DO_SSH(){
NODE=$1
SSH_USER=$2
shift
shift
SSH_COMMAND=('ssh' "${NODE}" '-l' "${SSH_USER}" 'sudo' "${@}")
"${SSH_COMMAND[@]}"
}
# get list of nodes
GET_NODES(){
#NODES=(192.168.12.5{1.})
API_METHOD='get'
API_PATH='/nodes'
API_CALL=("${API_METHOD}" "${API_PATH}")
PVESH_QUERY=('pvesh' "${API_CALL[@]}" '--output-format' 'json')
JQ_QUERY='.[].node'
QUERY_NODE=n3
# NODES=($(ssh 192.168.12.51 sudo pvecm nodes | tail -n+5 | tr -s ' ' | cut -d\ -f4))
mapfile -t NODES < <(DO_SSH "${QUERY_NODE}" "${USER}" "${PVESH_QUERY[@]}" | jq -r "${JQ_QUERY}")
export NODES
}
ROLL(){
#!/bin/bash
# roll.sh
set -euxo pipefail
NODE=$1
DO_SSH "${NODE}" "${USER}" reboot
DO_SSH "${NODE}" "${USER}" dmesg -w # wait on dmesg logs until reboot begins
until DO_SSH "${NODE}" "${USER}" whoami # then loop until we can authenticate again
do
sleep 1s
done
}
UPDATE_NODES(){
for NODE in "${NODES[@]}"
do
DO_SSH "${NODE}" "${USER}" sudo apt-get update
DO_SSH "${NODE}" "${USER}" sudo apt-get dist-upgrade -y --autoremove
done
}
ROLL_NODES(){
for NODE in "${NODES[@]}"
do
echo "REBOOTING NODE ${NODE} IN 5 SECONDS!"
sleep 5s
ROLL "${NODE}"
done
}
main(){
GET_NODES
UPDATE_NODES
ROLL_NODES
}
main
previous version
This is the reboot function from a basic script I use to perform a rolling reboot of a cluster. The way you’d use it is by looping through a list of nodes, and calling the script like ./roll.sh ${NODE}
.
#!/bin/bash
# roll.sh
set -euxo pipefail
NODE=$1
ssh ${NODE} reboot
# the following blocks until the reboot begins, and then loops until we can authenticate
ssh ${NODE} dmesg -w || until ssh ${NODE} whoami
do
sleep 1s
done
If you can safely reboot more than one node at a time in your cluster, you can use threading to call the script more than once. Exercise left to the reader, but I probably will eventually enhance this myself with a loop running up to MAX_OFFLINE_NODES threads at once.