The actual behavior for a given cluster may be hard to test because timing is relevant - it needs a minute (or two) to actually trigger HA activity.
Just for curiosity I tested some things to disturb a cluster - or at least a HA resource. This is a long post,
read it for entertainment or skip it. At the end I can reproduce a problematic situation
in my test-setup...
I have:
- a pure virtual Test-Cluster of six nodes (pna,...pnf), backed by really-slow proof-of-concept HC-Ceph
- two nodes (pnd/pne) build a HA-group "HATEST" with one VM (antix) up (on pnd) and running "ping" in a terminal - for visual monitoring
(First I had
three nodes in that HA-group. But the ultimate test is to shut down all relevant nodes. With three of them I would lose both PVE-Quorum and additionally Ceph (configured for size=4/min_size=2) would be really unhappy. So I stepped back to only two nodes for HA.)
Now I can test:
1) manual migration between those HA-nodes
- works as advertised, without any hickup
2) shutdown node (pnd) with that test-VM running
- Guest VM got shutdown = NOT migrated (because Policy=Conditional) and then restarted on another node (pne)
That's of course not what I wanted. Only now I changed Policy Conditional --> Policy Migrate
3) for curiosity: manually migrate from pnd to pnf = a Node outside that defined HA-group
- migration works!
- but then the VM migrates back from that non-HA-node to pnd after some seconds
4) shutdown node (pnd) with the test-VM running
- this triggers Migration (pnd-->pne)
- guest stays "up"; no hickup
Powering up again that test-node migrates the VM back to its original node (pnd)
All above tests were in normal working parameters and worked as documented. I did them to confirm this for my test-setup.
Now for the more interesting tests; the baseline is:
- all six nodes are up and running
- the test-VM is on pnd, up and running of course, with visible "ping"
5) shutdown all nodes in that HA-group all at once - triggered via GUI with three seconds delay for clicking around; the node with the test-VM (pnd) first
- surprise: the VM migrated out of the defined group onto node pna. No hickup, but unexpected
That's documented behavior..., now I set the flag "restricted" in my HATEST-group.
The VM migrated back to the origin automatically, as expected.
Back to normal, all Nodes up; VM on pnd
6) shutdown all Nodes in that HA-group all at once - triggered via GUI with three seconds delay; the node with the test-VM (pnd) first
- migration did NOT start
- shutdown of the secondary node (pne) finished
- the guest keeps running on the original node!
- shutdown of pnd obviously got cancelled, I waited five minutes
This situation is NOT clean:
- the VM still is running on pnd
- the node pnd is "grayed out" (pvestatd?)
- but I can click "Shutdown" from the WebGUI of another node --> no reaction
- ssh works
- "qm list" confirms VM is here
Troubleshooting:
Code:
root@pnd:~# shutdown -h now
Failed to set wall message, ignoring: Transport endpoint is not connected
Call to PowerOff failed: Transport endpoint is not connected
root@pnd:~# systemctl start pvestatd
Failed to start pvestatd.service: Transaction for pvestatd.service/start is destructive (systemd-binfmt.service has 'stop' job queued, but 'start' is included in transaction).
See system logs and 'systemctl status pvestatd.service' for details.
root@pnd:~# systemctl start pvescheduler.service
Failed to start pvescheduler.service: Transaction for pvescheduler.service/start is destructive (dev-dm\x2d2.swap has 'stop' job queued, but 'start' is included in transaction).
See system logs and 'systemctl status pvescheduler.service' for details.
While I know how to force shutdown of that system... what would be a good way to continue???
Code:
root@pnd:~# qm shutdown 29101
Requesting HA stop for VM 29101 # DID WORK
And now the node shutdown too!
Shutting down both nodes of this two-node-HA-group did not behave well.
Power-on both nodes. The HA Request State is now "stopped", which feels just wrong. Probably because of my manual "qm shutdown".
Back to normal: all six nodes up; test-VM visible with "ping" on pnd.
7) same as 5+6 but in reverse order: first shutdown the unused HA-node pne, then pnd with the test-VM; again with ~three seconds time in between
- pne shuts down quickly
- same as in 6) : "qm shutdown" works and the node shut down also.
Now I possibly found the culprit: no agent in the test-VM! ACPI seems not be sufficient for my goal? NOW install qemu-guest-agent!
Back to normal: all six nodes up; test-VM visible with "ping" on pnd. Now WITH guest-agent...
8) same as 7 = shutdown pne + pnd --> seems like the guest-agent does not change anything
- the secondary HA-node pne shut down
- the node with the VM stays up and the VM keeps running
- in the WebGui both nodes are shown down after some minutes; both not manageable, same as before; VM still running!
- again: ssh --> "qm shutdown" required to shutdown VM and then the node follows automatically
Back to normal: all six nodes up; test-VM visible with "ping" on pnd.
9) slow down: just shutdown the currently-not-used pne. Then shutdown the only left node in the HATEST-group, running the test-VM
- shutdown pne works - of course!
- shutdown pnd does NOT work, it is plainly ignored!?
- after (exactly) five minutes it reached again that strange "grayed out" (not: red cross), "not manageable, but VM keeps running" state.
- without ssh I could do nothing
- the VM is still running... for 30(!) minutes. Then it shuts down!
- after this long timeout: missing services are restarted; the WebGUI is responsible again; the VM got restarted
I have no good explanation for the 30
Minutes knock-out. It feels suprising and the behaviour is not what I want. For an emergency shutdown (UPS) this is probably bad.
Reproducable result of this test-sequence as of now:
When I trigger "shutdown" on the last standing node of a HA group with one VM running I encounter an problematic state: after five minutes that node is not manageable in the WebGui anymore but the VM keeps running on this very node. After 30 minutes the system is manageble again.
Is this the expected behaviour? I am not sure...
PS: all nodes have two virtual OSDs for Ceph. I ignored that on purpose as "size=4/min_size=2" should keep critical trouble away.