Hi folks!
I run a 2-node HA cluster (with a QDevice). Today I had an outage on one of the nodes. The expected behaviour would be for the VMs and CTs on the failed node to migrate to the good node with minimal overall downtime.
However, what happened was this:
1. Node 1 failed/shutdown
2. VMs and containers on Node 1 failed over to Node 2 (correct)
3. Once on Node 2, PVE attempted to start the failed over VMs/CTs (correct)
4. The subsequent error received on Node 2 was: "TASK ERROR: zfs error: cannot open 'storage/subvol-104-disk-0': dataset does not exist". The failed over VMs received a similar error on the attempted start, indicating that the relevant disk was not present on Node 2
So to me it seems the problem is due to the fact that I run local ZFS storage on both nodes for containers and VMs. I also do not have storage replication setup between the 2 nodes. It looks like the VMs/CTs are being migrated across correctly in a HA scenario but because their storage is not present on the target node, they (obviously!) cannot start once failed over.
My simple question is - is my assessment of the problem correct and is it solvable by enabling storage migration on my cluster?
Thanks!
I run a 2-node HA cluster (with a QDevice). Today I had an outage on one of the nodes. The expected behaviour would be for the VMs and CTs on the failed node to migrate to the good node with minimal overall downtime.
However, what happened was this:
1. Node 1 failed/shutdown
2. VMs and containers on Node 1 failed over to Node 2 (correct)
3. Once on Node 2, PVE attempted to start the failed over VMs/CTs (correct)
4. The subsequent error received on Node 2 was: "TASK ERROR: zfs error: cannot open 'storage/subvol-104-disk-0': dataset does not exist". The failed over VMs received a similar error on the attempted start, indicating that the relevant disk was not present on Node 2
So to me it seems the problem is due to the fact that I run local ZFS storage on both nodes for containers and VMs. I also do not have storage replication setup between the 2 nodes. It looks like the VMs/CTs are being migrated across correctly in a HA scenario but because their storage is not present on the target node, they (obviously!) cannot start once failed over.
My simple question is - is my assessment of the problem correct and is it solvable by enabling storage migration on my cluster?
Thanks!
Last edited: