NUT orchestrated shutdown experiences?

Dunuin · Oct 29, 2022

Hi,

Previously I was running my TrueNAS server + my PVE server + managed switch 24/7 of a UPS. I used NUT as a server on the TrueNAS server and NUT as a client on the PVE server. That way it was ensured that the NAS is the master and the PVE server the slave, so the PVE server got shutdown first and the TrueNAS server last. This is important, as many VMs needs access to the SMB/NFS shares and in case the TrueNAS server would be unavailable when shutting down the PVE node, the VMs wouldn't be able to flush async writes to the SMB/NFS shares when PVE orders the VMs/LXCs to shutdown. So I need to ensure that the TrueNAS server always shuts down last.

But because of the insane electricity price increases in Germany I can't run my whole homelab 24/7 anymore. Its already 100€/month and prices might increase even more in the future where I might end up paying 200€/month so something similar running that homelab. I had a spare MiniPC laying around (old J3710 atom machine from 2016) so I added two enterprise SSDs, increased RAM to 16GB and installed PVE to it. Then I moved my most important services to it (OPNSense/Pihole/DokuWiki/Nextcloud/Zabbix) that I really need 24/7 access to. The idea is now that this small PVE mini PC is running 24/7 and I only boot up my TrueNAS server and big PVE server when I really need them.
This is of cause problematic because my small PVE server should also shut down when the UPS goes on battery but now the TrueNAS server acting as the NUT server is most of the time powered off. So what I would need to do would be setting up the small PVE server that runs 24/7 as the NUT server and the TrueNAS server and big PVE server as NUT clients. Setting up the small PVE server as the NUT server shouldn't be a big problem. But the problem now is, that I'm not sure how to orchestrate the shutdown sequence.

According to my research, it looks like NUT itself can't orchestrate the shutdown of multiple NUT clients in a specific order. It just sends a shutdown signal to all clients and then they shut down immediately. Or am I wrong?

Did someone maybe face the same problem and already solved it?
Would like to hear some experiences.

Best idea that comes to my mind would be some kind of custom script that the NUT server will run when the UPS goes on battery. This script then might use SSH to shut down servers in a fixed sequence. So something like:
- after X seconds on battery NUT server will call script in case power doesn't come back
- script tries to connect to big PVE server using SSH. If that fails, big PVE server is offline and it skips to the next step. In case it can connect using SSH, it will do a "shutdown now". Then ping it until it stops responding. Now the big PVE server should be offline and the TrueNAS server can be shutdown
- script tries to connect to TrueNAS server using SSH. If that fails, TrueNAS server is offline and it skips to the next step. In case it can connect using SSH, it will do a "shutdown now". Then ping it until it stops responding. Now the TrueNAS server should be offline and the small PVE server can shutdown itself
- script runs a "shutdown now" to shutdown itself

Or is there maybe a better way to do this?
Someone already wrote such a script and might share it?
Any hints on what would be important to keep in mind?

And I guess I should create a dedicated "shutdown" user with public key authentification and only the privileges to run "shutdown now" on all servers to not completely screw up security.

Neobin · Oct 29, 2022

Would it not be easier and especially safer (aka: less error prone) to get another (small) UPS for all the 24/7-stuff?

Dunuin · Oct 29, 2022

That would also be an option. But to keep costs down I really would prefer to run that small 24/7 PVE server of the same UPS, as it just needs 16W. Not just because of the initial price (50€ wouldn't be that bad), but also because a second UPS would increase my homelabs power consumption even futher and I would get two UPSs where I need to replace the batteries every several years.

For the safer aspect:
Maybe I could start the individual shutdown sequence after 30 seconds on battery and let it do a upsmon -c fsd to initialize an unordered shutdown of all remaining clients when the battery reaches 30%. In case of a problem (bad coding of the script, connection problems, ...) that tleast would ensure that the clients get somehow shutdown before the UPS runs out of battery.

apoc · Oct 29, 2022

I'd orchestrate the shutdown from the 24/7 server. everything else will not bring you stable results.
I am using since many years apcupsd which basically fires scripts. NUT should be able to do this as well.

Process-wise I can see two options:
Option 1:
- check the connectivity to your Test-PVE server (ping, ssh)
- if successful shut it down - you should not simply check ping or ssh, because on shutdown PVE will not allow access to SSH anymore. Does your PVE server have an IPMI-board? Can you get the status of the system via the IPMI to be sure?
- Then do the same with your TrueNAS
- finally shutdown the 24/7 machine.

Option 2:
- install NUT on all machines.
- on the event the Test-PVE will shut down on its own.
- on the True-NAS system check for PVE-connectivity from the Test-System, perhaps also if IO comes to the SMB/NFS
- on the 24/7 PVE server decicions also can be made without any dependency.

HTH

Dunuin · Oct 29, 2022

apoc said:
- if successful shut it down - you should not simply check ping or ssh, because on shutdown PVE will not allow access to SSH anymore. Does your PVE server have an IPMI-board? Can you get the status of the system via the IPMI to be sure?

Thats a good idea. Yes, both the big PVE server and the TrueNAS server got a BMC. I can check that using ipmitool. Did that in another script too that I use to boot and shutdown my backup server (another TrueNAS server just for backups that only runs a couple of hours per week).

apoc said:
Option 2:
- install NUT on all machines.
- on the event the Test-PVE will shut down on its own.
- on the True-NAS system check for PVE-connectivity from the Test-System, perhaps also if IO comes to the SMB/NFS
- on the 24/7 PVE server decicions also can be made without any dependency.

That option sounds more complex and probably more error prone. Not sure how to reliably check SMB/NFS IO.

Search

Search

NUT orchestrated shutdown experiences?

Dunuin

Distinguished Member

Neobin

Distinguished Member

Dunuin

Distinguished Member

apoc

Famous Member

Dunuin

Distinguished Member