Cluster disaster recovery

RubenRat · Jun 24, 2024

Hi everyone,

I've been running PVE in a simple home setup for a number of years and am fairly familiar with it, but I've never clustered it before it's just been a single node.

I'm about to migrate to a 2 node cluster and will use a qdevice to ensure a split-brain situation can't occur.

Given that I have only two physical locations to put nodes, it's possible in a total disaster that one node and the qdevice could be destroyed simultaneously. Obviously that's a pretty bad scenario, but I still want to understand what would happen in that situation and plan for it.

My understanding is, with 1 node left, the cluster would become read only and I couldn't even start any guests (say the remaining node had lost power during the disaster, now I can't even get VMs back online to start recovering stuff).

What's the recovery action in that sort of situation? Can the single remaining node (which I understand to be in a read-only state) be forced from the command line to remove the dead nodes from it's configuration and become a standalone host? Can it be forced to remove the broken/non-existent qdevice and adopt a new one into the cluster so it becomes functional again?

Thanks

justinclift · Jun 24, 2024

RubenRat said:
with 1 node left, the cluster would become read only and I couldn't even start any guests

It's worse than that. 1 minute after that last node loses quorum, the watchdog software on that node will reboot it in an attempt to see if that fixes things.

If you happen to be logged in to that last node during that 1 minute window, you can type this to (temporarily) change the quorum rules and allow it to run as a 1 node cluster:

Bash:

# pvecm expected 1

That'll let you start/stop/etc VMs on that node as you would any other single node setup.

Alternatively, you can configure the watchdog to not automatically reboot isolated nodes. But that'll stop that node from self-healing any issues that might actually be fixable by a reboot. It's one of those things where you have to really think though the options and make your choice.

justinclift · Jun 24, 2024

RubenRat said:
Can the single remaining node (which I understand to be in a read-only state) be forced from the command line to remove the dead nodes from it's configuration and become a standalone host?

Yep. Once you run the pvecm expected 1 command to allow the last node to do stuff, you can then run the other pvecm commands for adding and removing Proxmox nodes. eg:

Bash:

# pvecm delnode node2

I'm not (yet) sure how to remove a quorum node though, as now I think about it I've not had to do that before. I'll probably experiment with that later on to find out though (not a today thing).

RubenRat · Jun 24, 2024

justinclift said:
Yep. Once you run the pvecm expected 1 command to allow the last node to do stuff, you can then run the other pvecm commands for adding and removing Proxmox nodes. eg:

Bash:

# pvecm delnode node2

I'm not (yet) sure how to remove a quorum node though, as now I think about it I've not had to do that before. I'll probably experiment with that later on to find out though (not a today thing).

Thanks for the replies.

Just so I am clear, the "pvecm expected 1" command can still be run outside of that 1 minute window? Your first reply mentioned typing it in during the one minute window, but I am guessing you mean that would have to happen to prevent the automatic reboot, not that it would subsequently be impossible to make that change after the reboot had occured, right?

I would almost certainly disable the automatic reboots anyway, since I'll encrypt the dataset where the VMs are stored (it'll be local-zfs storage on both nodes), so a node automatically rebooting without admin intervention wouldn't fix anything anyway.

I don't need HA or automatic recovery from unusual fault conditions, I'm OK with having to intervene manually to get things running again in failure situations as long as that intervention can be made fairly painless. Most of the cluster benefits for me will be easier maintenance on each host without having to shut down the VMs and risk a borked host update, plus the option to recover to the other node if the active one fails unexpectedly, but if that happened an interactive recovery of, and migration onto, the other node rather than an automatic one is fine.

Thanks for mentioning about the quorum node removal thing, I won't be configuring it for a little while yet anyway so I'll watch this thread and if you do get around to testing it then it would be great to know your results so I can pre-document what to do in a failure situation.

justinclift · Jun 24, 2024

Yeah, the pvecm expected 1 command just tells a Proxmox cluster (that's not in quorum) that 1 node is actually enough to make a quorum, and thereby allow the present node to make changes. You can use it anywhere that's appropriate.

If you run it on a cluster that's already in quorum, it'll just give you a weird look (example) and ignore the command.

And yeah, the "within the 1 minute window" thing is because the node will have rebooted if you don't. Unless you configure the node not to do so, in which case you can type that command whenever you want.

RubenRat said:
I'll encrypt the dataset where the VMs are stored

Hmmm, you can configure your nodes to automatically grab encryption keys from an external source (eg some other host, over ssh).

It's not some in-built feature of Proxmox, but more a case of simple shell scripting and a systemd .service file. The details should be fairly easy to find with some searching in these forums.

RubenRat · Jun 24, 2024

justinclift said:
Hmmm, you can configure your nodes to automatically grab encryption keys from an external spot (eg some of the host, over ssh).

It's not some in-built feature of Proxmox, but more a case of simple shell script and systemd .service file. The details should be fairly easy to find with some searching in these forums.

Oh cool that's pretty handy yeah I'll look into that as well, thanks.

Dunuin · Jun 24, 2024

RubenRat said:
I would almost certainly disable the automatic reboots anyway, since I'll encrypt the dataset where the VMs are stored (it'll be local-zfs storage on both nodes), so a node automatically rebooting without admin intervention wouldn't fix anything anyway.

Then make sure to use LUKS/SED with unencrypted ZFS on top and not ZFS native encryption as with ZFS native encryption you won't be able to use replication nor to migrate VMs.

RubenRat · Jun 24, 2024

Dunuin said:
Then make sure to use LUKS/SED with unencrypted ZFS on top and not ZFS native encryption as with ZFS native encryption you won't be able to use replication or migrate VMs.

Oh wow thanks that's really helpful to know in advance.

Search

Search

Cluster disaster recovery

RubenRat

New Member

justinclift

Active Member

justinclift

Active Member

RubenRat

New Member

justinclift

Active Member

RubenRat

New Member

Dunuin

Distinguished Member

RubenRat

New Member