proxmox upgrade

no that is not recommended and won't help you either if you trigger the same bug again, as (re)joining the cluster is exactly the moment where you could trigger it. we are analyzing the bug in question together with upstream devs, it's very rare to trigger and hard to reproduce so far.
 
Hi,

If it is a bug, May I know how we can know the status of the bug fix , is there any panel we can look for updates, or will reply here from proxmox side?

Thank you.
 
Hi,

May I know that will you post an update here once you found out more regarding this?

Thank you.
 
I'll try to remember to update the relevant forum threads as well, yes.
 
packages are now on pve-no-subscription, please see the following extra information if you have been affected by a full-cluster-crash with the mentioned symptoms (one node rebooting/upgrading/restarting corosync -> cpg_join/cpg_send_message retry log entries followed by watchdog expiring on all nodes):

https://bugzilla.proxmox.com/show_bug.cgi?id=3672

for people who triggered this easily because of their particular load/network situation, it might be advisable to follow the following procedure to avoid triggering it again when installing the fixed versions:

stop the HA services (first LRM on all nodes, then CRM on all nodes - this disables HA, but also disarms the watchdog so you don't risk fencing)

then upgrade all nodes (this will automatically restart corosync, which will pick up the fixed libknet as well)

then start the HA services again (again first LRM on all nodes, then CRM) to re-enable HA features
 
Hi,

So may I confirm if we can avoid the issue if we do the steps mentied below?

stop the HA services (first LRM on all nodes, then CRM on all nodes - this disables HA, but also disarms the watchdog so you don't risk fencing)

Also, could you please share the steps to stop both LRM, CRM, and watchdog


Thank you
 
Code:
systemctl stop pve-ha-lrm

on all nodes, followed by

Code:
systemctl stop pve-ha-crm

on all nodes, followed by

Code:
apt update; apt full-upgrade

on all nodes, followed by

Code:
systemctl start pve-ha-lrm

on all nodes, followed by

Code:
systemctl start pve-ha-crm