Multiple Clusters destroyed at the same time

MasterTH

Renowned Member
Jun 12, 2009
239
7
83
www.sonog.de
Hi,

yesterday i got a really, really strange issue. i've got 3 different clusters running and yesterday at 20:04 all three of them get corrupted. i asked the datacenter-staff but nothing happened there (no network reboots or something like this), what they could see is, that there was a ddos running onto an ip-adresse of a virtual machine.
Versions are different, two of them has got 2.2 and the third has 2.3

Could a ddos make such problems?

Kind regards
MasterTH
 
Hi,

yesterday i got a really, really strange issue. i've got 3 different clusters running and yesterday at 20:04 all three of them get corrupted. i asked the datacenter-staff but nothing happened there (no network reboots or something like this), what they could see is, that there was a ddos running onto an ip-adresse of a virtual machine.
Versions are different, two of them has got 2.2 and the third has 2.3

Could a ddos make such problems?

Kind regards
MasterTH


What do you mean by destroyed/corrupted ?
 
maybe your network has issues with IP multicast? e.g. your switches dropped multicast traffic (check logs there).
 
It's possible that it was a multicast problem, I have had a similar problem with multicast because of igmp snopping on linux bridges, blocking all multicast traffic.
1 host have sentmulticast crap and have impacted all my differents clusters on same vlan, each linux bridge was blocking multicast traffic.

I have resolved it with echo 0 > /sys/devices/virtual/net/vmbrX/bridge/multicast_snooping or put ip of proxmox on physical interface and not bridge.
 
Hi,

Same problem here : 2 clusters, running PVE v2.2. One production cluster, 5 nodes, one test cluster, 3 nodes.

Upgrade the test one (apt-get update ; apt-get dist-upgrade) to latest version, all seems OK but cman is out of order, stoppable but not restartable (quorum lost, etc).

A reboot of cluster one solved the problem.

BUT production cluster is also out of sync, cman impacted by multicast traffic on the test cluster! WITHOUT any apt-get command.

All VMs are OK, but no cluster stack (no migration, no vzdump, etc).

A reboot cycle needs to be planified, VM needs to be stopped : not good...

Probably a multicast propagation impact, I'm not an expert.

Any advice?

Thanks,

Christophe.
 
ddos was about a minute, then datacenter blocked traffic

That is enough to break corosync cluster communication. You should use a separate network for cluster communication if you expect DOS attacks on the network.