Delay in starting VMs.

j_s

Member
Sep 2, 2022
11
2
8
Hello. Relatively new to Proxmox guy here. I created several independent Proxmox hosts for a few days, then later joined them into a cluster and setup HA failover and such. Really enjoyed learning proxmox. However there's 1 thing that has bothered me, and I can't seem to come up with the proper keywords to get an answer, so I figured I'd ask here.

Before I clustered the machines together, when I started and stopped a VM from the WebGUI, the VM would nearly instantly "power on" as well as "power off". Less than 1 second would be my best guess. However, ever since I setup the cluster, VMs take 15 seconds (and sometimes more) before they power on (or off).

I can see this time in the tasks at the bottom of the WebGUI. I just shutdown and started 2 VMs for maintenance, and one showed:

HA 100 - Start 22:23:03
VM 100 - Start 22:23:18

That's 15 seconds. (this is about the shortest I've seen this year)

The other:

HA 117 - Start 20:32:48
VM 117 - Start 20:33:07

That's 19 seconds.

Still I've had others take 30 seconds or so.

Likewise, if I shutdown a VM from within the VM itself, the console will go offline, and almost always its 15+ seconds before the WebGUI of Proxmox actually shows the VM is powered off.

This is particularly annoying when I shutdown the entire cluster for maintenance and then have to start the VMs up afterwards. I end up waiting 20+ seconds for each VM to "power on" before I try the next one otherwise it seems like the VMs take even longer than 20 seconds to power on.

The *only* thing that I find even remotely out of place is that the Datacenter Summary page shows "Ceph" as "HEALTH_WARN" and if I click on it it says "OSD count 0 < osd_pool_default_size 2". Unless I'm really lost on what Ceph is and such (and I will admit its my weakest part of Proxmox), I don't use ceph at all, so I've ignored this since I noticed it.

All of the hosts are overpowered for their workload, all have 10Gb redundant networking, and all seem to be generally healthy. top shows CPU usage for 1m, 5m, and 15m averages less than 0.80, and RAM shows about 180GB of RAM free out of 256GB.

I do have a qdevice, but these problems were from before I had a qdevice.

Can someone explain what is going on "behind the scenes" that takes all that time? I'd like to find and fix the problem (assuming there is a problem). This has been going on for quite some time. I think I first made the HA cluster in 7.1, and still persists today despite using 7.4.

Thanks!
 
Options
However, ever since I setup the cluster, VMs take 15 seconds (and sometimes more) before they power on (or off).
At least that's not due to clustering, but due to using the HA stack. For HA the direct command execution is done by the HA environment which takes the request state as source of truth, so the command execution goes:

  1. API call sees that the service is HA managed and sets request state to started
  2. The current active Cluster Resource Manager (CRM) processes this and triggers a request-start state
  3. The request-start state checks if the service should be moved to another node for startup, depending on HA group priorities and the Cluster Resource Scheduler (CRS) configuration
  4. Once it found a node the service will be relocated to it, if not already located there – this can take additional time
  5. Once the target node is all right the CRM encodes the start request as job in the manager state
  6. The Local Resource Managers (LRM) read the manager state, and the LRM on the node the service is located on sees the new request state job and spawns the worker actually starting the VM
  7. The CRM checks on the result and processes any other migration, fence, ... request/need.

See also the docs: https://pve.proxmox.com/pve-docs/chapter-ha-manager.html

Now, the basic processing needs to keep that way to 1) ensure features like HA groups and CRS can work and 2) avoid edge cases and fencing, which the HA stack is all about.

But, there's still some optimization potential in how new request states are processed once we have file event poll support added to our real time clustered configuration file system (pmxcfs); as then we can run some loops earlier if the CRM idles, but there are
changes to request state; same for LRMs with changes to manager state.

One of our devs has made a proof of concept for poll support in pmxcfs, as we want that also for the firewall rule generation and other stuff; but its delicate change and a lot of other work is sometimes pushing these "optimization" changes of features that work, albeit possible slightly slower/more inefficient as they could, a bit down on the priority list – so I cannot give you any exact time frame on when this will be improved.

Likewise, if I shutdown a VM from within the VM itself, the console will go offline, and almost always its 15+ seconds before the WebGUI of Proxmox actually shows the VM is powered off.
That sounds like we should handle console disconnect better for HA managed VMs, as I guess you only notice that on virtual guests that are HA managed?

I end up waiting 20+ seconds for each VM to "power on" before I try the next one otherwise it seems like the VMs take even longer than 20 seconds to power on.
That isn't true. The HA CRM processes all changes in one loop, the HA LRM then also sees them all at once and starts $max_workers of parallel workers for startup, which defaults to 4 and can be overridden in the Datacenter ->
Options -> Max Worker setting.

So using the nodes Bulk Start or Bulk Shutdown feature should work great for parallel stops, and reduce the delays from CRM & LRM config & state processing, as all is batched at once.

FYI: you can also try the maintenance mode: https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_node_maintenance
Albeit that migrates the guests away to other nodes; but for VMs under HA that should be possible already anyway.
 
  • Like
Reactions: Neobin