HA Migration issues when VM has local storage

NomadCF

Active Member
Dec 20, 2017
27
1
43
42
We continue to run into an issue where we unexpectantly lose a host due for "X" reason (network outage, power, hardware, etc.). When this happens and a VM is configured with both HA and is using local storage. The VM becomes unusable as HA tries and fails to migrate the VM to another host. Proxmox is incorrectly moving the VM's configuration to another host per HA without verifying the storage has successfully been moved and exists first. When the host does come back online, the VM can't be moved back either by HA (after clearing the error) or via migration by the gui or cli. We have to manually go in and move the config to the correct host, then clear the HA error. Then everything starts up correctly.

We understand that HA is trying to migrate and bring back online the VM in question, but it's assuming the storage being used is shared without any kind of verification first.
 
That doesn't always apply. I.e. you may be using ZFS a local storage + storage replication and configure HA to use a group with those "storage replicated" servers.

You should simply not configure a VM in HA if you know that the VM will not be able to be moved to another server. Some kind of shared or replicated storage is listed as a requirement in the manual [1].

Maybe PVE should not let configure HA for a VM with local storage or, as you say, not try to migrate the VM even if on HA if at the time HA kicks in the VM uses local storage (maybe the VM had shared storage and got it drive(s) moved to local for some reason).

[1] https://pve.proxmox.com/wiki/High_Availability#_requirements
 
That doesn't always apply. I.e. you may be using ZFS a local storage + storage replication and configure HA to use a group with those "storage replicated" servers.

You should simply not configure a VM in HA if you know that the VM will not be able to be moved to another server. Some kind of shared or replicated storage is listed as a requirement in the manual [1].

Maybe PVE should not let configure HA for a VM with local storage or, as you say, not try to migrate the VM even if on HA if at the time HA kicks in the VM uses local storage (maybe the VM had shared storage and got it drive(s) moved to local for some reason).

[1] https://pve.proxmox.com/wiki/High_Availability#_requirements


I never used or talked about storage replication, I specify outlined a HA setup using local storage only and this error is repeatable. HA plus local storage is a valid setup and is extremely useful as is. But that fact remains that that HA doesn't do any kind if validation checks on the MV storage before moving and starting a VM.

Even with storage replication this can be the case in the even the replication has become corrupt (whether due to it's first sync not finishing being HA is activated, the dataset doesn't exit for whatever reason, etc.).

The fact is Promox needs to do more checks and validations over just assuming and hoping for the best.
 
How would this work?

Either you establish "networked storage" or utilize ZFS replication. In any case the HA-partner needs access to the virtual disks of the just-died VM.

What needs to happen is that HA checks that the remove host can access the storage needed, if not then it should not move the config there and error out HA if no other host can access that storage. Problem solved.

HA will live migrate without replication when you vary the priority of host in that HA groups.
 
I would like to know your use case for HA with local storage, as I can't really find a situation were it will work. Again, shared storage is a requirement for HA.

Simple all clustered systems firewall, dhcp, dns and AD have a failover VM. In each of these clusters a VM is always setup to not use any shared storage as a (last resort) fall back. This way only a single node and disk set is required to keep everything limping along.
 
Simple all clustered systems firewall, dhcp, dns and AD have a failover VM. In each of these clusters a VM is always setup to not use any shared storage as a (last resort) fall back. This way only a single node and disk set is required to keep everything limping along.
So don't configure HA for those VMs: problem solved.
 
So don't configure HA for those VMs: problem solved.

No, the code is bugged. HA itself will have this issue with any VM and any storage setup when that storage is unable to that VM & host. The code base need to verify that host has access to the needed storage & the virtual disks for that VM before moving it anywhere at anytime. It's that simple.

And HA for this setup is again useful from a management prospective.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!