Not anymore, coredump is not installed by default on proxmox, and although I installed it on several nodes today, as mentioned HA is disabled so it wont actually catch anything here.
Who knows, We have tried every angle we could immagine over...
More updates from over the weekend: 2 reboots/crashes on 2 clusters.
1. This could be a problem with the server firmware or microcode? This one also gives slightly more information
Jun 27 11:41:22 pve30 kernel: pve-ha-lrm[113646]: segfault at 8...
That is my aim as well, and why I was asking if its feasible to debug that daemon in some form, the Abort seems to come out of nowhere but has quite heavy consequences.
echo "blacklist hpwdt" > /etc/modprobe.d/blacklist-hp.conf
and...
mlag -> 2 bonds -> several vlans.
corosync is configured to use a primary and backup, which are also split across both bonds.
On the corosync part my last error was months ago.
Had another incident yesterday and I do think I got slightly closer...
Thought you might want that: "journalctl -k -b -1" attached.
As far as I know yes, we are scrutinizing logs on the daily and im not aware of any errors or warnings on this part. - Did a scan for "unable to acquire lock" on the logs on all...
When using PMG cluster with oauth and "auto create users" every time a new/yet unknown user logs in to the web interface, it throws an error about the partner server not knowing that user, clicking OK makes the message go away and not come back...
I happens often on 1 cluster, less often on another (often is 2-3x per month, if we do not touch anything manually)
Softdog, were on HP machines so the HP modules are already blacklisted per earlier 'solution'
Nothing custom.
I hope not, its...
3x this week already - last saturday, yesterday and this morning
pve-ha-crm decides to die without cause, or at least without a usable message.
"watchdog update failed - Broken pipe"
The server is not doing anything special at that moment, no...
I can safely say this is not related to intel microcode, I see this happening since a short while on one of our clusters as well, which is fully AMD nodes.