Proxmox 4 HA VM Freeze State

adamb

Famous Member
Mar 1, 2012
1,329
77
113
When I shutdown a host which is running HA VM it puts the VM in a "freezed" state until the node comes back online. This is not what we are use to or expect from a HA cluster. Is there a way to ensure the VM gets moved to the other available node instead of waiting for the other node to come back online? Some of these servers take 5-10 minutes+ to reboot.
 
When I shutdown a host which is running HA VM it puts the VM in a "freezed" state until the node comes back online. This is not what we are use to or expect from a HA cluster.

This is how it is implemented currently (but why would you shutdown a node used for HA?)
 
This is how it is implemented currently (but why would you shutdown a node used for HA?)

We reboot one of our HA nodes monthly to ensure there are no random issues. We support doctor offices and hospitals so our systems being 100% is very critical. We have found that reboot's typically bring out issues with configuration changes and things of that nature. Or we reboot nodes for maintenance/updates. With the amount of time it takes to reboot our hardware, starting on the other remaining node is critical to our operation. It could be the difference in 10 minutes of downtime or 2 minutes.

Obviously we can come up with new methods by simple migrating the VM before rebooting the node, but this is a bit more complicated. It would be nice to have the option to choose.
 
Last edited:
Then we can call it HA* or Best-Effort Availability :)

Question the idea of rebooting all you want, but we have been a IBM/HP shop for well over 25 years and have a very good understanding of how to support 1000's of servers out in the field. Rebooting has always been one of those things that prevent issues.
 
Question the idea of rebooting all you want, but we have been a IBM/HP shop for well over 25 years and have a very good understanding of how to support 1000's of servers out in the field. Rebooting has always been one of those things that prevent issues.

Actually I was making fun of Proxmox's HA, not your issue. The concept of *High* Availability is not very compatible with 10-15 minutes of downtime because of a (planned) host reboot. Btw, you should try a BIG server reboot (we have a 1.5TB RAM machine) to find out that 15 minutes of reboot time is very fast :)
 
Actually I was making fun of Proxmox's HA, not your issue. The concept of *High* Availability is not very compatible with 10-15 minutes of downtime because of a (planned) host reboot. Btw, you should try a BIG server reboot (we have a 1.5TB RAM machine) to find out that 15 minutes of reboot time is very fast :)

My mistake I shouldn't have jumped to conclusions, we always get flack for our reboot schedule. My bad!

We actually sell our HA servers with either 768GB or 1.5TB :). We know all about the painful reboot's! We disable all the PXE's and optimize the boot process as much as possible but its still painful. The M3 line of IBM's were the worst, 15-20 minutes, it was insane.
 
Last edited:
  • Like
Reactions: DontKnowMuchGuy
The concept of *High* Availability is not very compatible with 10-15 minutes of downtime because of a (planned) host reboot.

But If you plan to shutdown a server (you state to have a plan), you can also simply move the VMs to another server. I can't see why this need to be done automatically.
Also, most time I reboot a server, it is online again within 30 second ...
 
But If you plan to shutdown a server (you state to have a plan), you can also simply move the VMs to another server. I can't see why this need to be done automatically.
Also, most time I reboot a server, it is online again within 30 second ...

You obviously aren't running servers with large amounts of ram. They take significantly longer to boot than servers without loaded ram. Just because its a planned event doesn't mean its not automated. Moving the VM would be another step to the process and honestly im not a huge fan of automating live migration with no human intervention, it just sounds like a bad idea. I don't see the logic in freezing a VM while a host reboots, what benefit does this even provide?
 
Just because its a planned event doesn't mean its not automated. Moving the VM would be another step to the process and honestly im not a huge fan of automating live migration with no human intervention, it just sounds like a bad idea. I don't see the logic in freezing a VM while a host reboots, what benefit does this even provide?

I think it is good idea, because it is much safer to move a HA enabled VM manually:

1.) you can carefully select the target node - using human intelligence ;-)
2.) you can verify that everything went well

Besides, I also accept patches to implement other behavior...
 
I think it is good idea, because it is much safer to move a HA enabled VM manually:

1.) you can carefully select the target node - using human intelligence ;-)
2.) you can verify that everything went well

Besides, I also accept patches to implement other behavior...

I disagree but im not much of a programmer. Our only option will be to use "fence_node" instead of shutting down gracefully.

1. I have no need to carefully select a target node when there is only one it can run on.
2. We have other checks and scripts in place to ensure it went well and the VM is running
 
Moving the VM would be another step to the process and honestly im not a huge fan of automating live migration with no human intervention, it just sounds like a bad idea. I don't see the logic in freezing a VM while a host reboots, what benefit does this even provide?

Automatically moving the VM would have the same problems, no human intervention. Calling a script (for example) which does that would trigger the same behaviour as when we would implement it in the HA manager, I don't see why one should be safer than the other. But I could see that for some admins it would be more comfortable.

The logic is that a reboot is a planned action and we do not want trigger automatic things on such an action (a reason you mentioned also), also a possible out of control feed back loop should be avoided.

Manually unfreezing a service (e.g.: to a machine) should be thought about, but it's not that simple.

(Not the nicest and "no warranty") Work around for you, kill the pve-ha-lrm process and then reboot, the services will then be relocated.
 
Automatically moving the VM would have the same problems, no human intervention. Calling a script (for example) which does that would trigger the same behaviour as when we would implement it in the HA manager, I don't see why one should be safer than the other. But I could see that for some admins it would be more comfortable.

The logic is that a reboot is a planned action and we do not want trigger automatic things on such an action (a reason you mentioned also), also a possible out of control feed back loop should be avoided.

Manually unfreezing a service (e.g.: to a machine) should be thought about, but it's not that simple.

(Not the nicest and "no warranty") Work around for you, kill the pve-ha-lrm process and then reboot, the services will then be relocated.

Good info, I appreciate it!
 
No, freeze is only a state in our ha-manager logic, it has no effect on the machine itself. It will only prevent actions from the Cluster Resource Manager until the previously gracefully powered down machine and its Local Resource Manager is online again.

Edit: and it's already possible to freeze all KVM/QEMU machines
 
Hmm that's a new behavior to me in HA , suppose we had a node down for any random power problems , HA used to fence this node and move all the VMs to another nodes depending on the failover domain setup in cluster.conf
when the original node comes back online , it's all about the "nofailback" parameter if to move the CTs back or keep them in the running node.

this is what I know in HA in versions 3.x , was this changed in 4.x ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!