HA and Storage Replication: very simple question

infocus13 · Apr 1, 2021

Hi folks!

I run a 2-node HA cluster (with a QDevice). Today I had an outage on one of the nodes. The expected behaviour would be for the VMs and CTs on the failed node to migrate to the good node with minimal overall downtime.

However, what happened was this:

1. Node 1 failed/shutdown
2. VMs and containers on Node 1 failed over to Node 2 (correct)
3. Once on Node 2, PVE attempted to start the failed over VMs/CTs (correct)
4. The subsequent error received on Node 2 was: "TASK ERROR: zfs error: cannot open 'storage/subvol-104-disk-0': dataset does not exist". The failed over VMs received a similar error on the attempted start, indicating that the relevant disk was not present on Node 2

So to me it seems the problem is due to the fact that I run local ZFS storage on both nodes for containers and VMs. I also do not have storage replication setup between the 2 nodes. It looks like the VMs/CTs are being migrated across correctly in a HA scenario but because their storage is not present on the target node, they (obviously!) cannot start once failed over.

My simple question is - is my assessment of the problem correct and is it solvable by enabling storage migration on my cluster?

Thanks!

mira · Apr 1, 2021

Yes, exactly. It should work once all your HA resources are replicated to the other node.
As Storage Replication is run in intervals, some data could get lost in between the last replication and the node failure.

infocus13 · Apr 1, 2021

Thank you for confirming Mira

Replication now setup. Good lesson to learn!

Search

Search

HA and Storage Replication: very simple question

infocus13

Active Member

mira

Proxmox Staff Member

infocus13

Active Member

We value your privacy