CT migration to node that does not have dataset. Cannot migrate back

*Daedalus · Feb 5, 2024

Hi all. Any chance anyone can help a noob here?

I've got a 3 node cluster that I was running failover tests on. No Ceph, so doing HA via replication every 5 mins.

Yanked power on node1, forgot that in all my tweaking, I forgot to add CTs to a HA group, so I figured I'd set that with the node offline from node2. I should mention at this point that problem CT is only replicated to node3.

CT tries to start on node3, but gets:
TASK ERROR: zfs error: cannot open 'local-nvme/subvol-180-disk-0': dataset does not exist

I checked, it does. zfs list | grep subvol-180, shows it.

But then it migrated, succesfully, somehow, to node2, where the dataset doesn't exist.

Now, I can't start it, or migrate it. I'd really love some help as I'm a bit stuck.
The ZFS volume is present on nodes 1 and 3, but I can't get the CT to migrate to either of those. I'd previously done a live migration from the GUI with no issues, this was the first time simulating a node failure.

I'm aware I'll need to create a restricted HA group for this CT to only use nodes 1 and 3 in future, I'm just not sure how to get it back up at present. I have a backup, but I'd like to know how to recover from this scenario for future reference.

Edit:
For anyone coming across this in the future, I was able to manually move the config file to node1 and start it again:
On node2:
cd /etc/pve/lxc
nano 180.conf, copy contents
mv 180.conf notactanymore.conf

On node1:
cd /etc/pve/lxc
nano 180.conf, paste contents

Looks like permissions got applied automatically, CT appeared in the UI, and I could start it perfectly fine.

Search

Search

CT migration to node that does not have dataset. Cannot migrate back

*Daedalus

New Member

We value your privacy