Newly booted Hosts sometimes fails to start VMs

tschmidt

Member
Oct 11, 2023
17
5
8
I run a 4 Node HA cluster with ceph. VMs are set to migrate on host reboots.

Yesterday I updated (regular subscription repo) and rebooted the Hosts one by one. On one of them the VMs failed to migrate back with the following error (sys_images being cephfs):

Code:
TASK ERROR: hookscript error for 509 on pre-start: script 'sys_images:snippets/tag-dirty' does not exist

This failed gracefully as the VMs just kept running on the other Hosts. I checked sys_images and it was mounted normally. Manually migrating the VMs worked too. This was the first time I experienced this and rebooting the same Host did not trigger it again.

Later that day I had a power failure that took the cluster communication down (and HA then restarted the Hosts). After I got the switches back online all VMs had gone into the HA error state because of the same error. Again sys_images was mounted and the VM started fine after taking them out of the error state.

Anyone experienced something like this? Is this something on my end or did they add a race condition in the last update?

(PS: I'm not interested in advertising my email via the bug-tracker)