Proxmox V4 bugs

newacecorp · Dec 7, 2016

Hello,

We've been using Proxmox for years now and really like the latest version.

A few bugs that appear annoying:

(1) When restarting a node using the web interface in a multi-node cluster, any running non-HA VMs are not automatically live-migrated from that node to any other node. I would expect that any VMs with shared storage should be migrated to any other compatible node in the cluster. In the case of a HA VM, the VM is migrated, but it is first shutdown on the node that is being shutdown and then later restarted on another node. Why not live migrate the HA VM to another node before the restart instead?

(2) An HA VM on non-shared storage (e.g. local-lvm-thin) is still automatically migrated to another node when the node on which it was originally started is being shutdown. The HA VM can then no longer be restarted even if the original node comes back up. I have to manually move the XXX.conf file from node A to node B, e.g. /etc/pve/nodes/A/qemu-server/XXX.conf to /etc/pve/nodes/B/qemu-server/XXX.conf in order to be able to restart it from the web interface again.

Thank you.

Regards,

Stephan.

tom · Dec 7, 2016

newacecorp said:
Hello,

We've been using Proxmox for years now and really like the latest version.

A few bugs that appear annoying:

(1) When restarting a node using the web interface in a multi-node cluster, any running non-HA VMs are not automatically live-migrated from that node to any other node. I would expect that any VMs with shared storage should be migrated to any other compatible node in the cluster. In the case of a HA VM, the VM is migrated, but it is first shutdown on the node that is being shutdown and then later restarted on another node. Why not live migrate the HA VM to another node before the restart instead?

This is not a but, this is the expected behaviour. You talk about a new feature here. You can already live-migrate all (manually).

newacecorp said:
(2) An HA VM on non-shared storage (e.g. local-lvm-thin) is still automatically migrated to another node when the node on which it was originally started is being shutdown. The HA VM can then no longer be restarted even if the original node comes back up. I have to manually move the XXX.conf file from node A to node B, e.g. /etc/pve/nodes/A/qemu-server/XXX.conf to /etc/pve/nodes/B/qemu-server/XXX.conf in order to be able to restart it from the web interface again.

Thank you.

Regards,

Stephan.

HA without shared/distributed storage makes no sense for me as this cannot work by design.

newacecorp · Dec 7, 2016

Thanks for your response, Tom. Whether this is a new feature or not is irrelevant. If I have running VMs on a node that I want to restart (because of repo updates) I expect those VMs to be transferred automatically to other nodes when I click the "restart" button in the web-interface. It is common sense. Now I have to manually live-migrate all the VMs (30-40 in my case) myself. Even if these VMs are HA VMs, I have to do this as restarting a node actually shuts down these VMs instead of live-migrate. I don't think this is too much to ask for a feature.

You misunderstood my second point. Proxmox migrates the VM even if it is on non-shared storage. It shouldn't do this, but it does. Go try it for yourself. Just make sure that you mark the VM as HA.

Regards,

Stephan.

tom · Dec 7, 2016

newacecorp said:
Thanks for your response, Tom. Whether this is a new feature or not is irrelevant. If I have running VMs on a node that I want to restart (because of repo updates) I expect those VMs to be transferred automatically to other nodes when I click the "restart" button in the web-interface. It is common sense. Now I have to manually live-migrate all the VMs (30-40 in my case) myself. Even if these VMs are HA VMs, I have to do this as restarting a node actually shuts down these VMs instead of live-migrate. I don't think this is too much to ask for a feature.

I just mention that you request a new feature but you write that its a bug in the thread topic. This is misleading.

Please file an enhancement request via https://bugzilla.proxmox.com (pve-manager).

newacecorp said:
You misunderstood my second point. Proxmox migrates the VM even if it is on non-shared storage. It shouldn't do this, but it does. Go try it for yourself. Just make sure that you mark the VM as HA.

Regards,

Stephan.

If a VM has no shared/distributed storage, you - the admin - should NOT mark it for HA.

Detecting this automatically would be nice, but hard to implement for all storage types.

newacecorp · Dec 7, 2016

Thanks for providing the link. I will submit it as a feature request.

Concerning your second point, I still consider this a bug because I want the VM to be HA. What if the QEMU process for that VM crashes? What if the VM OS crashes and the QEMU-GUEST is no longer sending heartbeats to the host? In these cases, I want the VM to be restarted and I can only do that when it is flagged as HA. HA means (to me) that the VM is being monitored and as long as the host that it runs on is up and running I want the VM to stay running also.

Regards,

Stephan.

tom · Dec 7, 2016

newacecorp said:
Thanks for providing the link. I will submit it as a feature request.

Concerning your second point, I still consider this a bug because I want the VM to be HA. What if the QEMU process for that VM crashes? What if the VM OS crashes and the QEMU-GUEST is no longer sending heartbeats to the host? In these cases, I want the VM to be restarted and I can only do that when it is flagged as HA. HA means (to me) that the VM is being monitored and as long as the host that it runs on is up and running I want the VM to stay running also.

Regards,

Stephan.

You need to restrict your enabled HA VM to hosts, where all needed resources to start the VM are available.

By doing this, its impossible that the HA manager tries to start on a not suitable host.

newacecorp · Dec 7, 2016

Tom, that is why it is called a bug. The HA manager shouldn't migrate it, but it does! Please try this for yourself.

dietmar · Dec 7, 2016

newacecorp said:
(1) When restarting a node using the web interface in a multi-node cluster, any running non-HA VMs are not automatically live-migrated from that node to any other node. I would expect that any VMs with shared storage should be migrated to any other compatible node in the cluster.

I am strongly against such behavior - this looks totally unexpected for me.

newacecorp said:
(2) An HA VM on non-shared storage (e.g. local-lvm-thin) is still automatically migrated to another node when the node on which it was originally started is being shutdown. The HA VM can then no longer be restarted even if the original node comes back up. I have to manually move the XXX.conf file from node A to node B, e.g. /etc/pve/nodes/A/qemu-server/XXX.conf to /etc/pve/nodes/B/qemu-server/XXX.conf in order to be able to restart it from the web interface again.

Please file a bug at bugzilla.proxmox com (including your VM config). But Why do you mark a VM with non-shared storage as HA at all?

newacecorp · Dec 7, 2016

Thanks for your response, Dietmar. Please check this behaviour with VMWare ESXi with vCenter server. I'm confused why you think this is unexpected behaviour. If the VMs are all running on a host and I shutdown that host, why would I not want to keep these VMs running? I think this is more common sense than to assume that if I want to shutdown the host that all VMs should be shutdown as well. I think that if I want to have the VMs shutdown, I will go ahead and do this manually one-by-one.

With respect to the bug report. I mark the VM as HA because I want to keep it running no matter what. As I mentioned previously, if the qemu process dies or the OS itself crashes, I want to have it restarted. This is a function of HA. The fact that I have it on non-shared storage means that it can't migrate it to another host, but I don't want it to. I only want HA to monitor to make sure the VM stays running on that host.

The reason I am using non-shared storage is because it is part of a database cluster. The database has built-in clustering support so there is no sense to put it on shared storage and suffer a performance hit when I don't gain anything by having HA move it around to other nodes. The database clustering software can deal with the fact that a node in the cluster is down.

That is why I mark the VM on non-shared storage as HA. The curious fact is that Proxmox tries to migrate it despite that it is on non-shared storage.

newacecorp · Dec 7, 2016

I just wanted to clarify that you need to look at this from a production environment standpoint. I have 30-40 VMs running on a host and I need to restart that host because I am doing repo updates. Why on earth would I want to also shutdown all the VMs and disrupt service? My job is to make sure that all VMs and associated services are available 24/7. So how do I do maintenance now? Well, I manually live-migrate 30-40 VMs and then reboot the host. This seems nonsensical. If the idea of a cluster and high-availability is to ensure uninterrupted service. Therefore, it makes more sense to me that Proxmox migrates any active VMs that are on shared storage to other nodes before doing the restart. Is this such an insane thought?

jhboricua · Dec 7, 2016

dietmar said:
I am strongly against such behavior - this looks totally unexpected for me.

Both vSphere and Hyper-V 2012 (heck even Citrix XenServer 6.x if I recall correctly) offer that functionality out of the box for clustered hosts. Clearly this is something their customers want and appreciate.

So I'm curious, why are you and others in the Proxmox staff so strongly opposed to such functionality that seems basic to us admins? I'm really interested to understand where you guys are coming from.

tom · Dec 7, 2016

newacecorp said:
...I only want HA to monitor to make sure the VM stays running on that host.

As I wrote, you can do this. Just restrict the HA VM to this host.

newacecorp said:
The reason I am using non-shared storage is because it is part of a database cluster. The database has built-in clustering support so there is no sense to put it on shared storage and suffer a performance hit when I don't gain anything by having HA move it around to other nodes. The database clustering software can deal with the fact that a node in the cluster is down.

That is why I mark the VM on non-shared storage as HA. The curious fact is that Proxmox tries to migrate it despite that it is on non-shared storage.

You just did yet fully understand our HA technology. Do it the right way, the HA manager will not migrate it.

Check again:
http://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#HA_Groups

A really cool way to learn and play with Proxmox VE HA - the HA_Simulator
http://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#HA_Simulator

newacecorp · Dec 7, 2016

Take a look at this test setup. I have a HA VM called db1 running on host m5. You can see that it is part of an HA group called M5, which only has a single host in it (m5). This means that it should never migrate to any other host.

The VM called db1 uses only non-shared storage local-lvm.

I now shutdown host m5 and within a minute or so, db1 ends up on m6. First of all, the HA manager should never have migrated it because it was part of a HA group with only a single host that didn't include m6. Secondly, the backend storage for db1 is local-lvm (not shared). These are two bugs as far as I can tell.

newacecorp · Dec 7, 2016

I should point out that the VM db1 is now stuck at this point. No way using the web interface to migrate db1 back to host m5. I have to manually copy the 105.conf file from host m6 back to host m5.

tom · Dec 7, 2016

If the VM migrates to another host, you did not restrict it to just one host.

read again:
http://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#HA_Groups

newacecorp · Dec 7, 2016

Sigh.... Tom, yes, I did restrict it to just one host. That is the problem and the reason I call it a bug. Do you need a screenshot to prove it?

tom · Dec 7, 2016

newacecorp said:
Sigh.... Tom, yes, I did restrict it to just one host. That is the problem and the reason I call it a bug. Do you need a screenshot to prove it?

Yes, please provide the screenshot of your HA group setup.

LnxBil · Dec 8, 2016

jhboricua said:
Both vSphere and Hyper-V 2012 (heck even Citrix XenServer 6.x if I recall correctly) offer that functionality out of the box for clustered hosts. Clearly this is something their customers want and appreciate.

Yes, I also want to have this feature, because it is common sense in the virtualization community besides Proxmox VE and I miss it badly.

jhboricua said:
So I'm curious, why are you and others in the Proxmox staff so strongly opposed to such functionality that seems basic to us admins? I'm really interested to understand where you guys are coming from.

+1

LnxBil · Dec 8, 2016

newacecorp said:
Well, I manually live-migrate 30-40 VMs and then reboot the host.

There is a new feature available to do that for you. You could also use simple shell commands do migrate all running VMs via qm migrate. That's what I've done in the past.

newacecorp · Dec 8, 2016

Tom, I concede that I did not have the "restricted" option checked for that group. As I understand it now, the HA groups are affinity groups identifying the preferred hosts to run a particular VM but is not limited to those hosts unless the "restricted" option is checked. I did not know this. My bad.

Once I checked this, the VM (db1) no longer migrated to the other hosts. However, when I rebooted the host, the VM entered an error state and stayed this way even after the host came back up. There is no way to clear this error state. I would have expected the error state to clear and the VM to be automatically restarted by the HA manager. Instead, I had to remove the VM from HA to clear the error state. I could then start the VM and add it back to HA.

So there are still two bugs:

(1) HA VMs on non-shared storage enter error state when the host reboots and this state is not automatically cleared and the VM restarted when the host comes back up
(2) HA VMs on non-shared storage are still migrated to other hosts if the VM is part of an unrestricted HA group

What this boils down to is that the HA manager needs to better manage VMs with non-shared storage.

Regards,

Stephan.

Proxmox V4 bugs

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Active Member

Renowned Member

Proxmox Staff Member

Active Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Distinguished Member

Distinguished Member

Active Member

We value your privacy