4-Node-Ceph-Cluster went down

Still, your problem is that at some point some/all nodes can't see each other via corosync link(s), losing quorum and making HA reboot some/all nodes, causing ceph to lose quorum and making VM I/O to halt. I will even say that you are using a single corosync link shared with ceph and corosync goes mad when ceph starts rebalancing and latency in the network starts jittering, making some node(s) to restart again, adding salt to the injury :)
My plan is to have additional one 2.5Gb Ethernet connection. But is a connection just for corosync kind of mandatory? Which means I need two additional connections one is data transfer the other corosync?
 
Last edited:
Corosync requires a reliable and stable (latency wise) connection among all nodes at all times [1].

If you plan to use HA, you must use at least two independent links (independent nics and switches!) and at least one of them should be exclusive for corosync traffic for reliable operation. Either you risk full cluster reboots as you've seen. You can still share corosync with other traffic, say ceph and VM traffic, as the chances of having both links saturated/broken are much lower. Also, the higher the capacity of the network, the harder it gets to saturate it and the easier to share it with corosync.

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network
 
I think I figured out what caused the unexpected shutdown, but some guessing is included.

First: It all happend because of a broken Network cable, its clip was missing. But here comes the guessing as I didn't find evidence:
Because cable wasn't seated properly the connection speed might have been around 100Mbit during the shutdown. What makes me assume this is that I for sure was touching that cable so the drama begun.

But what confuses me is that the host with the cable problem doesn't start its network on boot automatic.

As a workaround I put this into crontab:
Code:
@reboot /usr/sbin/ip link set enp0s31f6 up && /usr/sbin/ifup vmbr0

I did also try to boot with different hardware, same behavior.

Now my question did the broken cable trigger some kind of safety mechanism in this proxmox installation?
 
Now my question did the broken cable trigger some kind of safety mechanism in this proxmox installation?
If I understand it properly:

- The cluster has Ceph and one corosync link shared in the same nic. There are HA VMs in each host.
- One host was off due to maintenance.
- One host lost link in the corosync, so that host lost quorum. As it had HA resources, it would eventually fence itself.
- The two remaining hosts also lost quorum, and HA manager fenced both as the lost quorum.
- When the link in that one host was restored, Proxmox quorum got restablished and HA started VMs.

Maybe: if the node taken down for maintenance was off for more than 10 minutes (mon_down_out_interval), Ceph would set it's OSD as out and a re-balance would start creating replicas on the remaining 3 nodes, increasing network traffic and maybe making corosync traffic jitter too much/increase latency, making nodes lose quorum and each one the fenced itself.

Everything falls in the "expected" category :)

But what confuses me is that the host with the cable problem doesn't start its network on boot automatic.
Is it set to autostart on /etc/network/interfaces?
 
So the conclusion for me is I enjoy Ceph as storage more than BTRFS or ZFS.

Still waiting for the 2.5G Ethernet switch in my lab to arrive
 
Have you configured anything in HA, such as E.g. "ignore"? If so, it's enough that you actually activated HA.
 
Have you configured anything in HA, such as E.g. "ignore"? If so, it's enough that you actually activated HA.
yes I put most of my hosts in HA mode, but what do you mean with "ignore" specifically?
 
You can also give a VM the ignored state instead of started or stopped. Either way, HA is activated.

But the point is that you can't operate HA with 4 nodes, there is no clarity and problems like this are the result.
 
But the point is that you can't operate HA with 4 nodes, there is no clarity and problems like this are the result.
Thats only true to the point that it doesn't meat the requirements to have 2 hosts failing. If only one fails all is good.
 
Well, if this is your understanding of how to run CEPH and Proxmox in HA setup even though, then don't be surprised about such problems.

What you're doing may work. But it's definitely not state of the art.

At that point I'm out.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!