ha-manager status
shows any active CRM or LRM, they are definitely going to reboot if you lose quorum again for a while. And I do not think they are all idle from what I have seen, despite your HA config is empty, which is interesting.today in the morningWhen was it that you think you removed the HA service originally?
today in the morning
Feb 19 10:33:38 mdc-023
, so ... was it before or after that you removed it?ha-manager status
show now (does not matter on which node)?beforeThe logs you have provided before went tillFeb 19 10:33:38 mdc-023
, so ... was it before or after that you removed it?
root@mdc-023:~# ha-manager statusAlso, what doesha-manager status
show now (does not matter on which node)?
before
Feb 19 09:46:45 mdc-023 pve-ha-crm[11895]: removing stale service 'vm:2501' (no config)
Code:root@mdc-023:~# ha-manager status quorum OK master mdc-023 (active, Tue Feb 20 11:27:31 2024) lrm mdc-022 (idle, Tue Feb 20 11:27:36 2024) lrm mdc-023 (idle, Tue Feb 20 11:27:36 2024) lrm mdc-025 (idle, Tue Feb 20 11:27:36 2024) lrm mdc024 (idle, Tue Feb 20 11:27:36 2024)
ha-manager status
they are ALL idle and not set any service up as HA again. After that, upon lost quorum, you definitely "should" not be getting reboots.Alright!
So ...
Code:Feb 19 09:46:45 mdc-023 pve-ha-crm[11895]: removing stale service 'vm:2501' (no config)
If I get it right, you had this one forgotten HA machine there. Your reboots happened prior to this and they happened because of the quorum hiccups. Afterwards you removed the last HA service (yesterday morning).
This now makes more sense:
So beyond the known bug of the "dangling" CRM [1] (on a setup that previously used HA), the nodes now should not reboot anymore even if quorum is wonky.
I say "should" because it's a bit more complicated - it's bit lengthy and addressed in another post of mine [2].
Long story short, safest for you now is probably to reboot the mdc-023, double-check withha-manager status
they are ALL idle and not set any service up as HA again. After that, upon lost quorum, you definitely "should" not be getting reboots.
I say "should" again because the more drastic solution is at the end of the referred post above [2] for those that want to be absolutely sure.
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=5243
[2] https://forum.proxmox.com/threads/getting-rid-of-watchdog-emergency-node-reboot.136789/#post-635602