pve udate6to7 issue: changing MACs on bond and slaves causes ceph-cluster to degrade

inxamc · Jul 30, 2021

Hello Forum
Below I describe severe degrading issues of our 3-node-hyperconverged-meshed ceph-cluster (based on 15.2.13), which happened after the update6to7 and how we were able to resolve it.
After updating and rebooting, the slave-nics (ens2 and ens3) of our meshed bond0 showed newly assigned MAC addresses, which were divergent from each other. This caused bond0 not to come up and in turn resulted in a degraded ceph-cluster.
It happened during all 3 update processes of the nodes.
Maybe this should be added to the known issues in the related WIKI documents.
Following the data from the debugging and the applied solution:

Code:

before upd    : 5.4.128-1-pve #1 SMP PVE 5.4.128-1
ens2     : 68:05:ca:02:da:e0
ens3     : 68:05:ca:02:da:e0
bond0     : 68:05:ca:02:da:e0
result    : <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP>...  state UP

after upd    : 5.11.22-3-pve #1 SMP PVE 5.11.22-6
ens2     : 68:05:ca:02:da:e0
ens3    : 68:05:ca:02:da:44
bond0     : 52:5c:68:8b:1a:db
result    : <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP>...state DOWN

Having executed the below listed commands in exactly the given order,
synchronized the MAC-addresses of the slaves and the bond.
This reestablished a regular UP state of the bond, and in turn a healthy ceph-cluster again.

Code:

  ifdown ens2
  ifdown ens3
  ifdown bond0
  ifup bond0
  ifup ens2
  ifup ens3
  ifup bond0

result     :
ens2      : 52:5c:68:8b:1a:db ... permaddr 68:05:ca:02:da:e0
ens3      : 52:5c:68:8b:1a:db ... permaddr 68:05:ca:02:da:44
bond0    : 52:5c:68:8b:1a:db
result     : <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP>...state UP

The WIKI-document (https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Network) report a similar issue for bridges, but does not mention bonds.
Indeed the MAC-addresses of all 3 configured bridges after upd6to7 and reboot in our case changed too, but this didn't cause any issues.

Search

Search

pve udate6to7 issue: changing MACs on bond and slaves causes ceph-cluster to degrade

inxamc

Active Member