[SOLVED] HA failed migration. VM's went to the wrong server can't go back.

Prothane

Member
Nov 11, 2021
10
0
6
41
Milton, Ontario, Canada
www.prothane.ca
Hello,

I have 3 node setup, 2 node the same third node is just a Rasberry PI for quorum. I was rebuilding fail hard drive in my ZFS pool "HDD" on Server 1 (4 VM's). I had the all VM's in off state during the resivering. Server 1 during the rebuild send two VM to Server 2 but sent the other two to the quorum computer "Server 5" has no storage. I had replication only set the 4 VM's to go to "Server 2" in failover. Must be a bug. but now i'm stuck I see the two VM's on Server 5 I tried to migrate back but I get error. storage 'HDD' is not available on node 'VMServer5' (500). But the VM's disk (.raw) are still replicated on Server 1 and Server 2.
 

Attachments

  • Screen Shot 2022-07-21 at 11.38.10 AM.png
    Screen Shot 2022-07-21 at 11.38.10 AM.png
    947.9 KB · Views: 4
  • Screen Shot 2022-07-21 at 11.16.46 AM.png
    Screen Shot 2022-07-21 at 11.16.46 AM.png
    920.8 KB · Views: 5
Hi,
did you create a HA group and restrict to the nodes where the VM can actually run on? HA doesn't know by itself that certain VMs shouldn't go to certain nodes.

Please see my answer in a similar thread for suggestions how to fix it.
 
Thanks Fiona. I didn't, I follow a guild on line for making 2 node with quorum node and it didn't mention grouping the two servers in HA group. I will do that now. I was able to reverse the the vm on to the proper servers. It was pretty easy I unplug the quorum node ethernet to simulate a failure and the HA move the VM's on the proper servers. I see manual migration there is check but if it automated by the HA there no checks for compatibility.
 
Glad you were able to fix it! Yeah, the main use case for HA is with homogeneous clusters, so for restrictions, there's currently only HA groups.

When a node fails, HA doesn't do a normal migration, but recovers the configuration file to another node. And that currently assumes that HA groups are configured appropriately and doesn't do checks for available storages etc. Feel free to open a feature request for auto-detecting such HA resources with an invalid configuration on our bugtracker.