Shutdown cluster when power fails (using NUT)...

spotcatbug · Jun 1, 2022

I've been struggling with this for a week or more. I have a 2-node cluster, plus a qdevice. It works great. No quorum issues or anything; however, I can't get the cluster to shutdown correctly when the power goes out.

One cluster node is the NUT "master" and has the UPS attached via USB. The other node is a NUT slave. The qdevice is a raspberry pi on a different UPS (for the purposes of this question, you can assume the power on the raspberry pi never goes out.)

Side note: I have "HA Settings," "shutdown_policy" set to "freeze". This is the setting that suits my situation best. When I shutdown a node, no guests should automatically migrate. When the node is started-up again, the same HA guests that were started before should start again on the node.

What happens when the power goes out (or I pull the plug from the wall) is, the UPS tells the master node and the slave node that it's on battery. A bit later, if the UPS hasn't gone back to mains power, the slave and master decide to shutdown. "shutdown now -h" on the slave shuts down all the "regular" (non-HA) guests and then freezes the HA guests and finally shuts down the node. This NUT slave node goes down perfectly.

Now, at this point, the remaining node still has quorum (the qdevice is there and working.) However, when it goes to shutdown ("shutdown now -h"), it loses quorum before it's able to freeze the HA guests. corosync-qdevice.service is no longer running. Eventually (I'd say about a minute later,) the node restarts (it doesn't shutdown), not having shutdown the HA guests. This is terrible because, first it isn't cleanly shutting down the HA guests, but also it isn't turning off the UPS (that is the final step of the NUT master shutdown sequence.) The node ends up in a weird boot loop, while the UPS is still on battery.

It feels like this could work if corosync-qdevice.service was kept alive during node shutdown, somehow, but I'm just speculating at this point and I really need expert input on how this is supposed to work. Surely, Proxmox cluster shutdown on power failure is a solved problem. Is there a way to make the corosync-qdevice service stick around longer?

Just for more information: if I manually stop all the guests on both nodes (as in: use the GUI "bulk stop" feature) and then pull the plug, everything works as expected, except that the HA guests aren't frozen, they're stopped (bulk stop) so need to be manually started after power is restored. Quorum is never required by the master node because no HA guests need to be shutdown.

Thanks for any and all help/input!

gurubert · Jun 2, 2022

So corosync-qdevice service is terminating too early? Maybe you could tinker with the systemd dependencies of that unit for it to shutdown later?

spotcatbug · Jun 2, 2022

gurubert said:
So corosync-qdevice service is terminating too early? Maybe you could tinker with the systemd dependencies of that unit for it to shutdown later?

I've never messed with that stuff before. I'll definitely look into it - could be a fix for my particular case. Thank you!

However, the question still remains. My cluster has a separate qdevice that remains powered, and (theoretically) can maintain quorum while the last node shuts down its HA guests (assuming I can get the qdevice to remain, that is). If my cluster didn't have a separate qdevice that stayed powered-on longer than the rest of the cluster, how would shutdown proceed, when the final node(s) needs to shutdown its HA guests without quorum? This situation seems like it would be extremely common. I mean, if you have a cluster and all the nodes need to shutdown, at some point there's a node that needs to shutdown while there's no quorum, right? How is this working for everybody else? Is nobody automatically shutting down their Proxmox clusters when power goes out?

spotcatbug · Jun 4, 2022

I got this working for my particular setup. Just to recap, I have a cluster of two Proxmox nodes and a qdevice. This is probably not a usual Proxmox cluster, but it could be more popular than I think because, when you decide you want a cluster, a qdevice is easier to procure than a third node.

The two Proxmox nodes are on one UPS and the qdevice is on another UPS. When power goes out, I want the cluster to gracefully shutdown and then come back online when power returns. Luckily, in this setup, I can count on the qdevice sticking around until both nodes have shutdown, so (theoretically) quorum can be maintained when only the NUT master node remains (one node plus the qdevice is two votes), and the ha quests on that final node can be shutdown gracefully. Without quorum, the ha guests can't be shutdown, and the node ends up rebooting, instead of continuing the normal shutdown process that ends with turning off the UPS.

In order to maintain quorum using the qdevice, corosync-qdevice.service needs to remain running on the NUT master node during the node's shutdown process. I found that adding:

Before=pvedaemon.service

in the [Unit] section of /usr/lib/systemd/system/corosync-qdevice.service keeps the service alive long enough and quorum is maintained during shutdown. I chose pvedamon.service because, looking at systemd-analyze plot, I found that corosync-qdevice.service was starting just before pvedaemon.service so forcing that situation with the "Before" directive makes it stick around until after pvedaemon.service stops.

I'm interested in any input on this. I wanted to make it work in the most upgrade-proof way possible, but I don't know if that's achieved here. Also, I'm still really interested in knowing how this is supposed to work. I mean, if I had a third node instead of a qdevice, how does the last node shutdown? No quorum is possible. The only solution I can think of is to make sure there are no ha guests on the final node so that quorum isn't necessary.

gurubert · Jun 4, 2022

Do not edit the files in /usr/lib/systemd directly but use

systemctl edit corosync-qdevice.service

This will create an override file in /etc/systemd which will be upgrade safe.

Dunuin · Jun 4, 2022

I also would be interested in that. Will also switch to a 2 nodes cluster + Raspi Qdevice soon with a single UPS powering all 3 machines. Right now it is easy because I don't need to care about quorum.
And here it also would be important in which order the nodes will shutdown as one node would be running a TrueNAS VM both nodes will rely on and if the node with the TrueNAS VM would shutdown first the SMB/NFS shares wouldn't be available anymore and guests couldn't shutdown gracefully because data can't be flushed to the shares anymore. So the master needs to be my node running the TrueNAS VM so it will shutdown last.

spotcatbug · Jun 4, 2022

gurubert said:
Do not edit the files in /usr/lib/systemd directly but use

systemctl edit corosync-qdevice.service

This will create an override file in /etc/systemd which will be upgrade safe.

Ah. Thanks for that!

spotcatbug · Jun 4, 2022

Dunuin said:
I also would be interested in that. Will also switch to a 2 nodes cluster + Raspi Qdevice soon with a single UPS powering all 3 machines. Right now it is easy because I don't need to care about quorum.
And here it also would be important in which order the nodes will shutdown as one node would be running a TrueNAS VM both nodes will rely on and if the node with the TrueNAS VM would shutdown first the SMB/NFS shares wouldn't be available anymore and guests couldn't shutdown gracefully because data can't be flushed to the shares anymore. So the master needs to be my node running the TrueNAS VM so it will shutdown last.

I would suggest that in this case you make the raspberry pi qdevice the NUT master so it will go down last and shut off the UPS (the qdevice doesn't need quorum for anything.)

I don't really have info on how to ensure the ordering of the shutdown of the two Proxmox nodes, though. I'm sure it's possible and hopefully doesn't come down to timing; as in, make the TrueNAS node wait X seconds longer than the other node to shutdown (where X needs to be determined through testing.) There must be a way to make one NUT slave wait for another.

Search

Search

Shutdown cluster when power fails (using NUT)...

spotcatbug

Member

gurubert

Distinguished Member

spotcatbug

Member

spotcatbug

Member

gurubert

Distinguished Member

Dunuin

Distinguished Member

spotcatbug

Member

spotcatbug

Member

We value your privacy