Proxmox restarted unexpectedly

The remaining question is, if for some reason I lost communication between the 2 nodes, will they reboot?
 
link for log files: https://file.io/WlfZ4AHZctrv

file is blank cat /etc/pve/ha/resources.cfg

Your node 23 got activated CRM on Jan 31 because of vm:2501 was set as HA service, which is when you were testing the HA, I suppose. From the log it does appear it was subject to the watchdog ever since, it also appears the vm was an HA service all along. It was started on node 25 which had LRM subject to the same.

If your ha-manager status shows any active CRM or LRM, they are definitely going to reboot if you lose quorum again for a while. And I do not think they are all idle from what I have seen, despite your HA config is empty, which is interesting.

When was it that you think you removed the HA service originally?
 
The logs you have provided before went till Feb 19 10:33:38 mdc-023, so ... was it before or after that you removed it?
before
Also, what does ha-manager status show now (does not matter on which node)?
root@mdc-023:~# ha-manager status
quorum OK
master mdc-023 (active, Tue Feb 20 11:27:31 2024)
lrm mdc-022 (idle, Tue Feb 20 11:27:36 2024)
lrm mdc-023 (idle, Tue Feb 20 11:27:36 2024)
lrm mdc-025 (idle, Tue Feb 20 11:27:36 2024)
lrm mdc024 (idle, Tue Feb 20 11:27:36 2024)
 

Alright!

So ...
Code:
Feb 19 09:46:45 mdc-023 pve-ha-crm[11895]: removing stale service 'vm:2501' (no config)

If I get it right, you had this one forgotten HA machine there. Your reboots happened prior to this and they happened because of the quorum hiccups. Afterwards you removed the last HA service (yesterday morning).

This now makes more sense:

Code:
root@mdc-023:~# ha-manager status
quorum OK
master mdc-023 (active, Tue Feb 20 11:27:31 2024)
lrm mdc-022 (idle, Tue Feb 20 11:27:36 2024)
lrm mdc-023 (idle, Tue Feb 20 11:27:36 2024)
lrm mdc-025 (idle, Tue Feb 20 11:27:36 2024)
lrm mdc024 (idle, Tue Feb 20 11:27:36 2024)

So beyond the known bug of the "dangling" CRM [1] (on a setup that previously used HA), the nodes now should not reboot anymore even if quorum is wonky.

I say "should" because it's a bit more complicated - it's bit lengthy and addressed in another post of mine [2].

Long story short, safest for you now is probably to reboot the mdc-023, double-check with ha-manager status they are ALL idle and not set any service up as HA again. After that, upon lost quorum, you definitely "should" not be getting reboots.

I say "should" again because the more drastic solution is at the end of the referred post above [2] for those that want to be absolutely sure.

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=5243
[2] https://forum.proxmox.com/threads/getting-rid-of-watchdog-emergency-node-reboot.136789/#post-635602
 
Alright!

So ...
Code:
Feb 19 09:46:45 mdc-023 pve-ha-crm[11895]: removing stale service 'vm:2501' (no config)

If I get it right, you had this one forgotten HA machine there. Your reboots happened prior to this and they happened because of the quorum hiccups. Afterwards you removed the last HA service (yesterday morning).

This now makes more sense:



So beyond the known bug of the "dangling" CRM [1] (on a setup that previously used HA), the nodes now should not reboot anymore even if quorum is wonky.

I say "should" because it's a bit more complicated - it's bit lengthy and addressed in another post of mine [2].

Long story short, safest for you now is probably to reboot the mdc-023, double-check with ha-manager status they are ALL idle and not set any service up as HA again. After that, upon lost quorum, you definitely "should" not be getting reboots.

I say "should" again because the more drastic solution is at the end of the referred post above [2] for those that want to be absolutely sure.

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=5243
[2] https://forum.proxmox.com/threads/getting-rid-of-watchdog-emergency-node-reboot.136789/#post-635602


I understand, thank you for your attention to my problem
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!