[SOLVED] How to wake and shutdown spare server for replication jobs?

moshpete

New Member
Mar 24, 2020
8
2
3
42
I have a PVE cluster with a backup server which I would like to keep turned off most of the time as a "warm spare". Would it be possible to automatically issue a wake-on-lan command before replication jobs are scheduled to run, and shut the node down after all the replication jobs have been completed (provided no VM is running on that node)?

I know I could make a cron job on the main server to start the backup node with pvenode wakeonlan a while before the scheduled replication jobs, but how to issue a shutdown command after the jobs have been run? Are there any replication job hooks for running scripts?
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
2,721
418
103
South Tyrol/Italy
shop.maurer-it.com
Hi,

no there are currently no hooks for replication. Out of interest, what's the reason for wanting to turn off the other server?
How big is your interval between replicas?
 

moshpete

New Member
Mar 24, 2020
8
2
3
42
Thanks for the answer. I keep the other server turned off mainly for saving energy. I'm running a homelab and two servers running 24/7 would imply twice the impact on the energy bill. I am happy with daily replication jobs and a single server can run all my VMs with no need for true HA or automatic failover; some of them have external storage that I disconnect from one server and connect to the other to offline-migrate when needed.

Any suggestions on how I could monitor the replication jobs with a script to check whether they are finished? Also, I could not find the "shutdown" equivalent to pvenode wakeonlan on the documentation...
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
2,721
418
103
South Tyrol/Italy
shop.maurer-it.com
Hmm, ok, for a home lab this is understandable.

For this you could monitor pvesr status or pvesr status --guest VMID output from the source node.

For example here the output of a fast syncing CT shortly before, during and after the sync:

Bash:
root@prod1:~# pvesr status
JobID      Enabled    Target                           LastSync             NextSync   Duration  FailCount State
106-0      Yes        local/prod3           2020-05-22_17:31:01  2020-05-22_17:32:00   6.371804          0 OK
root@prod1:~# pvesr status
JobID      Enabled    Target                           LastSync             NextSync   Duration  FailCount State
106-0      Yes        local/prod3           2020-05-22_17:31:01              pending   6.371804          0 OK
root@prod1:~# pvesr status
JobID      Enabled    Target                           LastSync             NextSync   Duration  FailCount State
106-0      Yes        local/prod3           2020-05-22_17:31:01  2020-05-22_17:33:00   6.371804          0 SYNCING
root@prod1:~# pvesr status
JobID      Enabled    Target                           LastSync             NextSync   Duration  FailCount State
106-0      Yes        local/prod3           2020-05-22_17:32:01  2020-05-22_17:33:00   6.371804          0 OK
As you do daily syncs you could just pull out the NextSync column and see if that is pending or today, if not → shutdown the other node.

A bit hacky but ShouldWork™ Idea :)
 
  • Like
Reactions: moshpete

moshpete

New Member
Mar 24, 2020
8
2
3
42
In case anyone has a similar need, here are the (crude) scripts I made to wake up a spare server before replication jobs and shut it down after they are finished:

The first one should be scheduled to run every 5-10 minutes via cron:
Bash:
#!/bin/bash
# Wakeup other node if a sync job is pending or scheduled for the next N minutes

N=10
ps -A | grep -q "srpower"
if [ $? -eq 0 ]; then  # Already waiting for replication jobs to run
  exit 0
fi

srtest $N >/dev/null
if [ $? -ne 0 ]; then
  srpower &
fi
The next should be named srtest and testes for replication jobs pending, running or scheduled in the next X minutes:
Bash:
#!/bin/bash
# Check PVE enabled replication jobs
# returns 0 when there are no more jobs pending, running or scheduled in the next N minutes

N=15
if [ "$1" -gt 0 ] 2>/dev/null
then
  N="$1"
fi

status=$(pvesr status | grep Yes)
grep -q -e "pending" -e "SYNCING" <<< "$status"
if [ $? -eq 0 ]; then
  echo "Jobs pending or syncing."
  exit 1
fi
firstjob=$(awk '{gsub ("_"," ",$5); print $5}' <<< "$status" | sort | head -1)  # first "NextSync"
if [ "$firstjob" == "" ]; then
  echo "No enabled scheduled jobs."
  exit 0
fi
if [ "$firstjob" == "-" ]; then  # failed job that should have been run
  firstjob="now"
fi
  echo -n "Next job: $firstjob ("
firststamp=$(date -d "$firstjob" +%s)
threshold=$(date -d "now + $N minutes" +%s)
if [ "$firststamp" -le "$threshold" ]; then  # job scheduled in the next N minutes
  echo "$N minutes or less)"
  exit 1
fi
echo "more than $N minutes away)"
exit 0
The next is also called by the cron script and should be named srpower:
Bash:
#!/bin/bash
# Wake $OTHER node for replication
# turn it off after finished and no more scheduled jobs in the next N minutes

N=30
LOG=/var/log/srpower.log
DATESTR="+%Y-%m-%d %H:%M:%S"

if [ `hostname` == "server1" ]; then
  OTHER=server2
else
  OTHER=server1
fi

echo -n "[`date \"$DATESTR\"`] " >> $LOG
pvenode wakeonlan $OTHER &>> $LOG
sleep 60
until srtest $N &>/dev/null
do
  sleep 60
done
echo -n "[`date \"$DATESTR\"`] " >> $LOG
echo "No more replication jobs for the next $N minutes. Shutting $OTHER down." >> $LOG
ssh $OTHER "/usr/local/bin/poweroff-if-idle" &>> $LOG
And the last one should be named poweroff-if-idle and runs on the backup node:
Bash:
!/bin/bash
ct=$(lxc-ls --running | grep '\S')
if [ $? -eq 0 ]; then
  echo "CT $ct running, not powering off."
  exit 1
fi
qm list | grep -q running
if [ $? -eq 0 ]; then
  echo "VM(s) running, not powering off."
  exit 1
fi
if [ -f /var/lock/keepalive ]; then
  echo "/var/lock/keepalive present, not powering off $(hostname)."
  exit 1
fi
wall "System will auto-shutdown in 2 minutes unless \"keepalive\" is issued." > /dev/null 2>&1
sleep 120
if [ ! -f /var/lock/keepalive ]; then
  echo "No keepalive, CT or VM running, $(hostname) shutting itself down."
  poweroff
else
  echo "keepalive issued, not powering off $(hostname)."
fi
I haven't done extensive testing, but they seem to work :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!