Nope. It's a bug in OVS, based on both my own testing and Dietmar's testing.
Switched from OVS to Linux Bridging and the problem vanishes completely.
Dietmar previously reported (in various places including this forum) that it only affects certain combinations of kernel versions, NICs and driver versions... I don't have the luxury of verifying that assertion.
Repeat: removing openvswitch and using standard Linux Bridging (even including standard Linux Bonding) makes the problems disappear completely. The problems ONLY occur when using OVS, and even then only under very specific circumstances (an active TAP interface attached to the OVS bridge).
On top of that, I've previously reproduced the problem with a non-multicast-aware switch, IGMP is completely disabled on the current switch, and I've seen this behaviour on both Netgear and Dell switches.
The very precise set of circumstances needed to trigger the problem, as far as I can tell:
1. some particular kernel version (3.10.0-10 causes it for me, but I've seen it in the past too)
2. some particular NIC (I've seen in on tg3s, bnxs, igbs, e1000s, so I'm not 100% convinced it's NIC-specific)
3. some particular driver version (no clue which, see Dietmar's email to ovs-discuss)
4. Multiple PVE nodes in a PVE cluster
5. OpenVSwitch (with *or* without OVSIntPorts, this is a new-ish discovery)
6. At least one running VM with a TAP-mode vNIC bound to the OVS bridge
The node(s) with (a) running VM(s) bound to the OVS switch will report corrupt UDP checksums on the PVECM-related multicast packets coming from a quasi-random subset of peer nodes, but at least usually ("never" for me) not *all* of the peer nodes.
Then, at some point after the UDP checksum errors are getting logged, the cluster will lose quorum, perfectly happy PVE nodes will turn red in the GUI, and VM operations will fail, but none of the command-line tools can/will show any problem whatsoever!
At this point, I generally see the cluster partition into two discrete sub-clusters in the GUI, where subset A nodes show all of A's nodes as green in the GUI and others red, and subset B where B's nodes show each other green and others red. Occasionally, *all* the nodes turn red, and I have (twice) seen even the local host I'm talking to turn red in the GUI!
Tcpdump(8) confirms that all the nodes are receiving the multicast packets; I have not compared the received packets at the bit level, although that's an avenue for further investigation.
Worth noting also, is that all my testing was performed over LACP bonded links. I do not remember if I've seen this happen without LACP in the picture...
-Adam