pvecm & ceph seem happy, but still no quorum in GUI - how do I resync?

spirit · Jul 24, 2015

Hi,
So it's only multicast which is failing ?

For me, sound like igmp snooping on your physical switch which have some strange. (OVS don't have igmp snooping yet).

What is your physical switch hardware ?

Also, which kernel do you use ? 2.6.32 or 3.10 ?

I'm using ovsint tagged port here in production (kernel 3.10), and I never have had any problem.

athompso · Jul 24, 2015

Nope. It's a bug in OVS, based on both my own testing and Dietmar's testing.
Switched from OVS to Linux Bridging and the problem vanishes completely.
Dietmar previously reported (in various places including this forum) that it only affects certain combinations of kernel versions, NICs and driver versions... I don't have the luxury of verifying that assertion.

Repeat: removing openvswitch and using standard Linux Bridging (even including standard Linux Bonding) makes the problems disappear completely. The problems ONLY occur when using OVS, and even then only under very specific circumstances (an active TAP interface attached to the OVS bridge).

On top of that, I've previously reproduced the problem with a non-multicast-aware switch, IGMP is completely disabled on the current switch, and I've seen this behaviour on both Netgear and Dell switches.

The very precise set of circumstances needed to trigger the problem, as far as I can tell:
1. some particular kernel version (3.10.0-10 causes it for me, but I've seen it in the past too)
2. some particular NIC (I've seen in on tg3s, bnxs, igbs, e1000s, so I'm not 100% convinced it's NIC-specific)
3. some particular driver version (no clue which, see Dietmar's email to ovs-discuss)
4. Multiple PVE nodes in a PVE cluster
5. OpenVSwitch (with *or* without OVSIntPorts, this is a new-ish discovery)
6. At least one running VM with a TAP-mode vNIC bound to the OVS bridge

The node(s) with (a) running VM(s) bound to the OVS switch will report corrupt UDP checksums on the PVECM-related multicast packets coming from a quasi-random subset of peer nodes, but at least usually ("never" for me) not *all* of the peer nodes.
Then, at some point after the UDP checksum errors are getting logged, the cluster will lose quorum, perfectly happy PVE nodes will turn red in the GUI, and VM operations will fail, but none of the command-line tools can/will show any problem whatsoever!

At this point, I generally see the cluster partition into two discrete sub-clusters in the GUI, where subset A nodes show all of A's nodes as green in the GUI and others red, and subset B where B's nodes show each other green and others red. Occasionally, *all* the nodes turn red, and I have (twice) seen even the local host I'm talking to turn red in the GUI!

Tcpdump(8) confirms that all the nodes are receiving the multicast packets; I have not compared the received packets at the bit level, although that's an avenue for further investigation.

Worth noting also, is that all my testing was performed over LACP bonded links. I do not remember if I've seen this happen without LACP in the picture...

-Adam

proxtest · Aug 1, 2015

athompso said:
...
Repeat: removing openvswitch and using standard Linux Bridging (even including standard Linux Bonding) makes the problems disappear completely. The problems ONLY occur when using OVS, and even then only under very specific circumstances (an active TAP interface attached to the OVS bridge).
...
Tcpdump(8) confirms that all the nodes are receiving the multicast packets; I have not compared the received packets at the bit level, although that's an avenue for further investigation.
...

I have similar issue many times, the nodes are up and running but in the webgui some are red.
I don't use ovs, i use linux bridging only. Network are still working at this time but cman can't start any longer, have to restart the affected node with his vm's. :-(
So i don't use rgmanager since month because the other node kicks my still running node out and the inside running vm's too! :-(
That's not a really great design, maybe better to ask the storage in the future before one node reset another node the hard way? Databases, filesystem - everything damaged inside the vm's. :-(

But lucky boy i'm, i use ceph as storage and ceph still runs without any error, so the vm's are still running now and i can reboot after work. It's not perfect but it works for me.

Search

Search

pvecm & ceph seem happy, but still no quorum in GUI - how do I resync?

spirit

Distinguished Member

athompso

Renowned Member

proxtest

Active Member