8 node cluster, all went offline at the same time

DLinkOZ · Jan 9, 2026

I installed this cluster about a year and a half ago. It hasn't been patched (don't yell at me, not my call), but has been rock solid. Today, that all changed. It all went offline, no links on the NICs (bonded par per server). All other hardware is fine: storage server, firewalls, etc, it's all on the same switch and not experiencing any issues. We've tried knocking the network config down to a single interface (removing the bond) and breaking LAGs on the switch. Bringing up the NICs manually does cause them to show UP, but they're not really up and the vmbr0 never comes online.

A sample from one of them (excuse the screenshots, this is from a really old iDRAC):

On boot:

DLinkOZ · Jan 9, 2026

DLinkOZ · Jan 9, 2026

Rebooting always shows this type of error:

weehooey-bh · Jan 20, 2026

Did you get this resolved?

Alwin Antreich · Jan 20, 2026

DLinkOZ said:
Rebooting always shows this type of error:

View attachment 94708

Any usb devices hooked up?

Check your hardware, maybe udev has issues removing a device.

DLinkOZ said:
View attachment 94707

Well, looks like a classical issue of NOT HAVING SEPARATED YOUR COROSNYC LINKS. Link saturation (amongst others) may result in loss of quorum and with HA enabled the node will self-fence.

Is blanace-rr a result of the recovery effort or has it always been like that? This mode is good for sending over both links, but not good for receiving traffic. You'd need to use balance-alb or better just active-backup (simpler) [0]. That balance-rr mode may have contributed to the failure.

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_network_bond

Search

Search

8 node cluster, all went offline at the same time

DLinkOZ

New Member

DLinkOZ

New Member

DLinkOZ

New Member

weehooey-bh

Renowned Member

Alwin Antreich

Renowned Member

We value your privacy