Cluster disaster recovery

RubenRat

New Member
Jun 24, 2024
6
2
3
Hi everyone,

I've been running PVE in a simple home setup for a number of years and am fairly familiar with it, but I've never clustered it before it's just been a single node.

I'm about to migrate to a 2 node cluster and will use a qdevice to ensure a split-brain situation can't occur.

Given that I have only two physical locations to put nodes, it's possible in a total disaster that one node and the qdevice could be destroyed simultaneously. Obviously that's a pretty bad scenario, but I still want to understand what would happen in that situation and plan for it.

My understanding is, with 1 node left, the cluster would become read only and I couldn't even start any guests (say the remaining node had lost power during the disaster, now I can't even get VMs back online to start recovering stuff).

What's the recovery action in that sort of situation? Can the single remaining node (which I understand to be in a read-only state) be forced from the command line to remove the dead nodes from it's configuration and become a standalone host? Can it be forced to remove the broken/non-existent qdevice and adopt a new one into the cluster so it becomes functional again?

Thanks
 
with 1 node left, the cluster would become read only and I couldn't even start any guests
It's worse than that. 1 minute after that last node loses quorum, the watchdog software on that node will reboot it in an attempt to see if that fixes things.

If you happen to be logged in to that last node during that 1 minute window, you can type this to (temporarily) change the quorum rules and allow it to run as a 1 node cluster:

Bash:
# pvecm expected 1

That'll let you start/stop/etc VMs on that node as you would any other single node setup.

Alternatively, you can configure the watchdog to not automatically reboot isolated nodes. But that'll stop that node from self-healing any issues that might actually be fixable by a reboot. It's one of those things where you have to really think though the options and make your choice. ;)
 
Last edited:
  • Like
Reactions: Kingneutron
Can the single remaining node (which I understand to be in a read-only state) be forced from the command line to remove the dead nodes from it's configuration and become a standalone host?
Yep. Once you run the pvecm expected 1 command to allow the last node to do stuff, you can then run the other pvecm commands for adding and removing Proxmox nodes. eg:

Bash:
# pvecm delnode node2

I'm not (yet) sure how to remove a quorum node though, as now I think about it I've not had to do that before. I'll probably experiment with that later on to find out though (not a today thing).
 
Yep. Once you run the pvecm expected 1 command to allow the last node to do stuff, you can then run the other pvecm commands for adding and removing Proxmox nodes. eg:

Bash:
# pvecm delnode node2

I'm not (yet) sure how to remove a quorum node though, as now I think about it I've not had to do that before. I'll probably experiment with that later on to find out though (not a today thing).

Thanks for the replies.

Just so I am clear, the "pvecm expected 1" command can still be run outside of that 1 minute window? Your first reply mentioned typing it in during the one minute window, but I am guessing you mean that would have to happen to prevent the automatic reboot, not that it would subsequently be impossible to make that change after the reboot had occured, right?

I would almost certainly disable the automatic reboots anyway, since I'll encrypt the dataset where the VMs are stored (it'll be local-zfs storage on both nodes), so a node automatically rebooting without admin intervention wouldn't fix anything anyway.

I don't need HA or automatic recovery from unusual fault conditions, I'm OK with having to intervene manually to get things running again in failure situations as long as that intervention can be made fairly painless. Most of the cluster benefits for me will be easier maintenance on each host without having to shut down the VMs and risk a borked host update, plus the option to recover to the other node if the active one fails unexpectedly, but if that happened an interactive recovery of, and migration onto, the other node rather than an automatic one is fine.

Thanks for mentioning about the quorum node removal thing, I won't be configuring it for a little while yet anyway so I'll watch this thread and if you do get around to testing it then it would be great to know your results so I can pre-document what to do in a failure situation.
 
  • Like
Reactions: justinclift
Yeah, the pvecm expected 1 command just tells a Proxmox cluster (that's not in quorum) that 1 node is actually enough to make a quorum, and thereby allow the present node to make changes. You can use it anywhere that's appropriate.

If you run it on a cluster that's already in quorum, it'll just give you a weird look (example) and ignore the command.

And yeah, the "within the 1 minute window" thing is because the node will have rebooted if you don't. Unless you configure the node not to do so, in which case you can type that command whenever you want. :)

I'll encrypt the dataset where the VMs are stored
Hmmm, you can configure your nodes to automatically grab encryption keys from an external source (eg some other host, over ssh).

It's not some in-built feature of Proxmox, but more a case of simple shell scripting and a systemd .service file. The details should be fairly easy to find with some searching in these forums. :)
 
Last edited:
Hmmm, you can configure your nodes to automatically grab encryption keys from an external spot (eg some of the host, over ssh).

It's not some in-built feature of Proxmox, but more a case of simple shell script and systemd .service file. The details should be fairly easy to find with some searching in these forums. :)

Oh cool that's pretty handy yeah I'll look into that as well, thanks.
 
I would almost certainly disable the automatic reboots anyway, since I'll encrypt the dataset where the VMs are stored (it'll be local-zfs storage on both nodes), so a node automatically rebooting without admin intervention wouldn't fix anything anyway.
Then make sure to use LUKS/SED with unencrypted ZFS on top and not ZFS native encryption as with ZFS native encryption you won't be able to use replication nor to migrate VMs.
 
Last edited:
  • Like
Reactions: justinclift
Then make sure to use LUKS/SED with unencrypted ZFS on top and not ZFS native encryption as with ZFS native encryption you won't be able to use replication or migrate VMs.

Oh wow thanks that's really helpful to know in advance.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!