Hello Forum
Below I describe severe degrading issues of our 3-node-hyperconverged-meshed ceph-cluster (based on 15.2.13), which happened after the update6to7 and how we were able to resolve it.
After updating and rebooting, the slave-nics (ens2 and ens3) of our meshed bond0 showed newly assigned MAC addresses, which were divergent from each other. This caused bond0 not to come up and in turn resulted in a degraded ceph-cluster.
It happened during all 3 update processes of the nodes.
Maybe this should be added to the known issues in the related WIKI documents.
Following the data from the debugging and the applied solution:
Having executed the below listed commands in exactly the given order,
synchronized the MAC-addresses of the slaves and the bond.
This reestablished a regular UP state of the bond, and in turn a healthy ceph-cluster again.
The WIKI-document (https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Network) report a similar issue for bridges, but does not mention bonds.
Indeed the MAC-addresses of all 3 configured bridges after upd6to7 and reboot in our case changed too, but this didn't cause any issues.
Below I describe severe degrading issues of our 3-node-hyperconverged-meshed ceph-cluster (based on 15.2.13), which happened after the update6to7 and how we were able to resolve it.
After updating and rebooting, the slave-nics (ens2 and ens3) of our meshed bond0 showed newly assigned MAC addresses, which were divergent from each other. This caused bond0 not to come up and in turn resulted in a degraded ceph-cluster.
It happened during all 3 update processes of the nodes.
Maybe this should be added to the known issues in the related WIKI documents.
Following the data from the debugging and the applied solution:
Code:
before upd : 5.4.128-1-pve #1 SMP PVE 5.4.128-1
ens2 : 68:05:ca:02:da:e0
ens3 : 68:05:ca:02:da:e0
bond0 : 68:05:ca:02:da:e0
result : <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP>... state UP
after upd : 5.11.22-3-pve #1 SMP PVE 5.11.22-6
ens2 : 68:05:ca:02:da:e0
ens3 : 68:05:ca:02:da:44
bond0 : 52:5c:68:8b:1a:db
result : <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP>...state DOWN
Having executed the below listed commands in exactly the given order,
synchronized the MAC-addresses of the slaves and the bond.
This reestablished a regular UP state of the bond, and in turn a healthy ceph-cluster again.
Code:
ifdown ens2
ifdown ens3
ifdown bond0
ifup bond0
ifup ens2
ifup ens3
ifup bond0
result :
ens2 : 52:5c:68:8b:1a:db ... permaddr 68:05:ca:02:da:e0
ens3 : 52:5c:68:8b:1a:db ... permaddr 68:05:ca:02:da:44
bond0 : 52:5c:68:8b:1a:db
result : <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP>...state UP
The WIKI-document (https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Network) report a similar issue for bridges, but does not mention bonds.
Indeed the MAC-addresses of all 3 configured bridges after upd6to7 and reboot in our case changed too, but this didn't cause any issues.