I found this thread very useful and used these scripts to have my own versions for shut down and also a 'rolling reboot' of all nodes in the cluster.
You do not need to configure any node names, the scripts use API calls to fetch them but recognizes that it must issue commands to the other nodes first.
I tested them on my Proxmox test cluster (virtualized within my main Proxmox cluster - Proxmox Inception)
How I have them written there's a command line argument to 'really mean it'.
Bash:#!/bin/bash REALLY=$1 REALLYOK=shutitdownnow MODIFYCEPH=1 if [ "$REALLY" = "${REALLYOK}" ]; then # Disable history substitution. set +H # API key # Must have permissions of: VM.Audit, VM.PowerMgmt, Sys.Audit, Sys.PowerMgmt APIUSER=root@pam!maintenance APITOKEN=api key here # Use this host's API service ME=`hostname` BASEURL="https://${ME}:8006/api2/json" AUTH="Authorization: PVEAPIToken=${APIUSER}=${APITOKEN}" # Get list of all nodes in the cluster, have a list of ones that aren't me too ALLNODESJSON=$(curl -s -k -H "$AUTH" "$BASEURL"/cluster/resources?type=node | jq '[.data[].node]') ALLNODES=`echo $ALLNODESJSON | jq '.[]' | sed -e 's/"//g'` OTHERNODES=`echo $ALLNODESJSON | jq '.[]' | grep -v $ME | sed -e 's/"//g'` # Stop all vms and containers. echo 'Stopping all guests...' for NODE in ${ALLNODES[@]} do echo -e "Closing VMs running on $NODE" response=$(curl -s -k -H "$AUTH" -X POST "$BASEURL"/nodes/"$NODE"/stopall) done if [ "$MODIFYCEPH" = "1" ]; then echo "Halting Ceph autorecovery" response=$(curl -s -k -X PUT --data "value=1" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/noout) response=$(curl -s -k -X PUT --data "value=1" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/norebalance) response=$(curl -s -k -X PUT --data "value=1" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/norecover) fi # Wait until all are stopped. running=true while [ $running != false ] do echo "Checking if guest VMs are shut down ..." running=$(curl -s -k -H "$AUTH" "$BASEURL"/cluster/resources?type=vm | jq '.data | map(.status) | contains(["running"])') sleep 5 done echo 'Guests stopped.' # Shutdown the cluster. echo "Shutting down other nodes.." for NODE in ${OTHERNODES[@]} do echo "Shut down $NODE ..." response=$(curl -s -k -X POST --data "command=shutdown" -H "$AUTH" "$BASEURL"/nodes/"$NODE"/status) sleep 1 done echo "Shutdown commands sent to other nodes" echo "Shutting myself down" response=$(curl -s -k -X POST --data "command=shutdown" -H "$AUTH" "$BASEURL"/nodes/"$ME"/status) echo "Bye!" else echo -e "Do you really want to shut it all down?\n$0 ${REALLYOK}\nto confirm" fi
roll reboot of all nodes:
Bash:#!/bin/bash REALLY=$1 REALLYOK=yesdoit MODIFYCEPH=1 SLEEPWAIT=180 if [ "$REALLY" = "${REALLYOK}" ]; then # Disable history substitution. set +H # API key # Must have permissions of: VM.Audit, VM.PowerMgmt, Sys.Audit, Sys.PowerMgmt, and Sys.Modify (if altering Ceph) APIUSER=root@pam!maintenance #APITOKEN=api key here # Use this host's API service ME=`hostname` BASEURL="https://${ME}:8006/api2/json" AUTH="Authorization: PVEAPIToken=${APIUSER}=${APITOKEN}" # Get list of all nodes in the cluster, have a list of ones that aren't me too ALLNODESJSON=$(curl -s -k -H "$AUTH" "$BASEURL"/cluster/resources?type=node | jq '[.data[].node]') ALLNODES=`echo $ALLNODESJSON | jq '.[]' | sed -e 's/"//g'` OTHERNODES=`echo $ALLNODESJSON | jq '.[]' | grep -v $ME | sed -e 's/"//g'` if [ "$MODIFYCEPH" = "1" ]; then echo "Halting Ceph autorecovery" response=$(curl -s -k -X PUT --data "value=1" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/noout) response=$(curl -s -k -X PUT --data "value=1" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/norebalance) response=$(curl -s -k -X PUT --data "value=1" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/norecover) fi # Rolling reboot of cluster echo "Rebooting down other nodes.." for NODE in ${OTHERNODES[@]} do echo "Stopping all guests on $NODE ..." VMS=$(curl -s -k -H "$AUTH" "$BASEURL"/cluster/resources?type=vm | jq "[.data[] | {node, vmid, type, status} | select(.node == \"$NODE\") | select(.status == \"running\")]") if [ "$VMS" != "[]" ]; then echo -e "VMs running on $NODE:\n$VMS" response=$(curl -s -k -H "$AUTH" -X POST --data "vms=$vms" "$BASEURL"/nodes/"$NODE"/stopall) else echo "Nothing running on $NODE" fi # Wait until all are stopped. running=true while [ $running != false ] do VMS=$(curl -s -k -H "$AUTH" "$BASEURL"/cluster/resources?type=vm | jq "[.data[] | {node, vmid, type, status} | select(.node == \"$NODE\") | select(.status == \"running\")]") if [ "$VMS" != "[]" ]; then running=false fi sleep 5 done echo 'Guests on $NODE are stopped.' echo "Rebooting $NODE ..." response=$(curl -s -k -X POST --data "command=reboot" -H "$AUTH" "$BASEURL"/nodes/"$NODE"/status) echo "Waiting $SLEEPWAIT seconds for reboot to complete on $NODE ..." sleep $SLEEPWAIT echo "Checking if $NODE is back up yet" NODEONLINE="" while [ $NODEONLINE = "" ] do NODEONLINE=$(curl -s -k -H "$AUTH" "$BASEURL"/cluster/resources?type=node | jq ".data[] | {node,status} | select(.node == \"${NODE}\") | select(.status == \"online\") | .status") sleep 5 echo "Node $NODE is back online" done done if [ "$MODIFYCEPH" = "1" ]; then echo "Re-enabling Ceph" response=$(curl -s -k -X PUT --data "value=0" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/noout) response=$(curl -s -k -X PUT --data "value=0" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/norebalance) response=$(curl -s -k -X PUT --data "value=0" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/norecover) fi echo "Rebooting myself now" response=$(curl -s -k -X POST --data "command=reboot" -H "$AUTH" "$BASEURL"/nodes/"$ME"/status) echo "Bye!" else echo -e "Do you really want to reboot all?\n$0 ${REALLYOK}\nto confirm" fi
Revert any Ceph flags.. put this in a node start up
Bash:#!/bin/bash MODIFYCEPH=1 # API key # Must have permissions of: VM.Audit, VM.PowerMgmt, Sys.Audit, Sys.PowerMgmt, and Sys.Modify (if altering Ceph) APIUSER=root@pam!maintenance #APITOKEN=api key here # Use this host's API service ME=`hostname` BASEURL="https://${ME}:8006/api2/json" AUTH="Authorization: PVEAPIToken=${APIUSER}=${APITOKEN}" if [ "$MODIFYCEPH" = "1" ]; then echo "Re-enabling Ceph" response=$(curl -s -k -X PUT --data "value=0" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/noout) response=$(curl -s -k -X PUT --data "value=0" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/norebalance) response=$(curl -s -k -X PUT --data "value=0" -H "$AUTH" "$BASEURL"/cluster/ceph/flags/norecover) fi
Thanks! Looks great, just stumbled upon this thread again. Just in case you did not know: its now recommend to ONLY set noout for complete cluster shutdown: https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pveceph_shutdown there were cases were setting norebalance and norecover resulted in very long time to recover after booting the nodes again. Thanks for the script!