proxmox upgrade

powersupport · Oct 25, 2021

Hi,

We recently faced an issue with the Proxmox upgrade, 6.4 to 7.

Case https://forum.proxmox.com/threads/pve-upgrade-6-to-7-issue.97949/#post-424846

Now, we are looking for different options for an upgrade now and find the solution as below

1) Remove each node from the cluster.

2) upgrade the PVE to 7, then re-add to cluster.

Is this a recommended way?

Thank you.

fabian · Oct 25, 2021

no that is not recommended and won't help you either if you trigger the same bug again, as (re)joining the cluster is exactly the moment where you could trigger it. we are analyzing the bug in question together with upstream devs, it's very rare to trigger and hard to reproduce so far.

powersupport · Oct 25, 2021

Hi,

If it is a bug, May I know how we can know the status of the bug fix , is there any panel we can look for updates, or will reply here from proxmox side?

Thank you.

fabian · Oct 25, 2021

https://bugzilla.proxmox.com/show_bug.cgi?id=3672

powersupport · Oct 25, 2021

Hi,

May I know that will you post an update here once you found out more regarding this?

Thank you.

fabian · Oct 27, 2021

I'll try to remember to update the relevant forum threads as well, yes.

powersupport · Oct 27, 2021

Hi,

We will wait for the update then.

Thank you.

fabian · Nov 10, 2021

packages are now on pve-no-subscription, please see the following extra information if you have been affected by a full-cluster-crash with the mentioned symptoms (one node rebooting/upgrading/restarting corosync -> cpg_join/cpg_send_message retry log entries followed by watchdog expiring on all nodes):

https://bugzilla.proxmox.com/show_bug.cgi?id=3672

for people who triggered this easily because of their particular load/network situation, it might be advisable to follow the following procedure to avoid triggering it again when installing the fixed versions:

stop the HA services (first LRM on all nodes, then CRM on all nodes - this disables HA, but also disarms the watchdog so you don't risk fencing)

then upgrade all nodes (this will automatically restart corosync, which will pick up the fixed libknet as well)

then start the HA services again (again first LRM on all nodes, then CRM) to re-enable HA features

powersupport · Nov 12, 2021

Hi,

So may I confirm if we can avoid the issue if we do the steps mentied below?

stop the HA services (first LRM on all nodes, then CRM on all nodes - this disables HA, but also disarms the watchdog so you don't risk fencing)

Also, could you please share the steps to stop both LRM, CRM, and watchdog

Thank you

fabian · Nov 12, 2021

Code:

systemctl stop pve-ha-lrm

on all nodes, followed by

Code:

systemctl stop pve-ha-crm

on all nodes, followed by

Code:

apt update; apt full-upgrade

on all nodes, followed by

Code:

systemctl start pve-ha-lrm

on all nodes, followed by

Code:

systemctl start pve-ha-crm

Search

Search

proxmox upgrade

powersupport

Well-Known Member

fabian

Proxmox Staff Member

powersupport

Well-Known Member

fabian

Proxmox Staff Member

powersupport

Well-Known Member

fabian

Proxmox Staff Member

powersupport

Well-Known Member

fabian

Proxmox Staff Member

powersupport

Well-Known Member

fabian

Proxmox Staff Member

We value your privacy