Proxmox cluster dies after changing a switch port

proxorh

New Member
Feb 21, 2025
1
0
1
Hi,

I have a 2 node proxmox cluster with a qdevice for quroum, all connected to the same switch and working well.
Code:
Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.10.11 (local)
0x00000002          1    A,V,NMW 192.168.10.12
0x00000000          1            Qdevice

I shutdown one node (192.168.10.12) and the cluster still works well:
Code:
Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.10.11 (local)
0x00000002          1         NR 192.168.10.12
0x00000000          1            Qdevice


I then move 192.168.10.12 to another switch and turn it up. 192.168.10.11 can ping 192.168.10.12 and vice versa, and on the proxmox UI the 192.168.10.12 node shows a question mark, and cluster looks well:
Code:
Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.10.11 (local)
0x00000002          1    A,V,NMW 192.168.10.12
0x00000000          1            Qdevice


After about a minute the cluster dies. The active master node 192.168.10.11 drops the ssh connection, is unreachable and GUI is not responding. The cluster is still up via pvecm status as shown above, but the node is dead.

After a few minutes I am able to ssh again to the node but the GUI is still down and I have to poweroff both nodes using systemctl --force --force poweroff. When I turn on 192.168.10.11, the GUI comes back to life and I see 192.168.10.12 as down (red X), but the cluster status doesn't show it:
Code:
Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.10.11 (local)
0x00000000          1            Qdevice

When I connect 192.168.10.12 back on the same switch, everything works well! Both nodes are recognized and all is good.

Attached is the syslog of 192.168.10.11. I didn't find anything meaningful, but I don't know what to look for.

Would appreciate any hint or further debugging I should do, because I am quite stuck.
Thank you!
 

Attachments

Do you have a network loop? Maybe try enabling STP and see if one of your links gets flagged