Total cluster failure in the lab

Taledo

Active Member
Nov 20, 2020
78
9
28
54
Hey all,
Currently testing CEPH and Proxmox HA on a multi datacenter configuration (with black fiber, so sub ms latency.) I tried to recreate the worst scenario possible : what if all nodes lose the corosync layer? This should never happen, but that's what the lab is for.

I'm using Proxmox 8.2.2 and Ceph latest version.

Upon removing the corosync link, all nodes rebooted as expected and were left in a no quorum state. Now upon restarting the corosync link, one of my node straight up refused to connect back to the cluster. Not an issue as I can quit and rejoin the cluster. However, This leads me to discover another issue : one of the two remaining nodes is flooding the other one with corosync packets, causing packet loss and BIG cluster instability. (I'm seeing about 16000 packets in a 10 second tcpdump session).

Any idea on what exactly is happening here?

Cheers,

Taledo
 
Well, this didn't last long, as the watchdogs did their job... The whole thing rebooted and decided it wanted to behave again. Cluster is now up and running again.

Weird behaviour, though I kind of expected it before unplugging the whole thing.
 
Yes, that's the idea. A complete corosync should in theory never happen, but I've learned in this line of work that you don't say "If it happens" but "when it happens". So testing out in the lab is the best way to prepare for unplanned network dissasembly at 4am on a saturday :D
 
  • Like
Reactions: Kingneutron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!