4-Node-Ceph-Cluster went down

liszca · Nov 9, 2023

VictorSTS said:
Still, your problem is that at some point some/all nodes can't see each other via corosync link(s), losing quorum and making HA reboot some/all nodes, causing ceph to lose quorum and making VM I/O to halt. I will even say that you are using a single corosync link shared with ceph and corosync goes mad when ceph starts rebalancing and latency in the network starts jittering, making some node(s) to restart again, adding salt to the injury

My plan is to have additional one 2.5Gb Ethernet connection. But is a connection just for corosync kind of mandatory? Which means I need two additional connections one is data transfer the other corosync?

VictorSTS · Nov 9, 2023

Corosync requires a reliable and stable (latency wise) connection among all nodes at all times [1].

If you plan to use HA, you must use at least two independent links (independent nics and switches!) and at least one of them should be exclusive for corosync traffic for reliable operation. Either you risk full cluster reboots as you've seen. You can still share corosync with other traffic, say ceph and VM traffic, as the chances of having both links saturated/broken are much lower. Also, the higher the capacity of the network, the harder it gets to saturate it and the easier to share it with corosync.

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network

liszca · Nov 16, 2023

I think I figured out what caused the unexpected shutdown, but some guessing is included.

First: It all happend because of a broken Network cable, its clip was missing. But here comes the guessing as I didn't find evidence:
Because cable wasn't seated properly the connection speed might have been around 100Mbit during the shutdown. What makes me assume this is that I for sure was touching that cable so the drama begun.

But what confuses me is that the host with the cable problem doesn't start its network on boot automatic.

As a workaround I put this into crontab:

Code:

@reboot /usr/sbin/ip link set enp0s31f6 up && /usr/sbin/ifup vmbr0

I did also try to boot with different hardware, same behavior.

Now my question did the broken cable trigger some kind of safety mechanism in this proxmox installation?

VictorSTS · Nov 16, 2023

liszca said:
Now my question did the broken cable trigger some kind of safety mechanism in this proxmox installation?

If I understand it properly:

- The cluster has Ceph and one corosync link shared in the same nic. There are HA VMs in each host.
- One host was off due to maintenance.
- One host lost link in the corosync, so that host lost quorum. As it had HA resources, it would eventually fence itself.
- The two remaining hosts also lost quorum, and HA manager fenced both as the lost quorum.
- When the link in that one host was restored, Proxmox quorum got restablished and HA started VMs.

Maybe: if the node taken down for maintenance was off for more than 10 minutes (mon_down_out_interval), Ceph would set it's OSD as out and a re-balance would start creating replicas on the remaining 3 nodes, increasing network traffic and maybe making corosync traffic jitter too much/increase latency, making nodes lose quorum and each one the fenced itself.

Everything falls in the "expected" category

liszca said:
But what confuses me is that the host with the cable problem doesn't start its network on boot automatic.

Is it set to autostart on /etc/network/interfaces?

liszca · Nov 17, 2023

VictorSTS said:
Is it set to autostart on /etc/network/interfaces?

A look into Proxmox Web Interface showed it was not on auto. And ofcourse not in the mentioned file an "auto vmbr" was missing

liszca · Nov 17, 2023

So the conclusion for me is I enjoy Ceph as storage more than BTRFS or ZFS.

Still waiting for the 2.5G Ethernet switch in my lab to arrive

sb-jw · Nov 17, 2023

Have you configured anything in HA, such as E.g. "ignore"? If so, it's enough that you actually activated HA.

liszca · Nov 27, 2023

sb-jw said:
Have you configured anything in HA, such as E.g. "ignore"? If so, it's enough that you actually activated HA.

yes I put most of my hosts in HA mode, but what do you mean with "ignore" specifically?

sb-jw · Nov 27, 2023

You can also give a VM the ignored state instead of started or stopped. Either way, HA is activated.

But the point is that you can't operate HA with 4 nodes, there is no clarity and problems like this are the result.

liszca · Nov 27, 2023

sb-jw said:
But the point is that you can't operate HA with 4 nodes, there is no clarity and problems like this are the result.

Thats only true to the point that it doesn't meat the requirements to have 2 hosts failing. If only one fails all is good.

sb-jw · Nov 28, 2023

Well, if this is your understanding of how to run CEPH and Proxmox in HA setup even though, then don't be surprised about such problems.

What you're doing may work. But it's definitely not state of the art.

At that point I'm out.

Search

Search

4-Node-Ceph-Cluster went down

liszca

Well-Known Member

VictorSTS

Distinguished Member

liszca

Well-Known Member

VictorSTS

Distinguished Member

liszca

Well-Known Member

liszca

Well-Known Member

sb-jw

Famous Member

liszca

Well-Known Member

sb-jw

Famous Member

liszca

Well-Known Member

sb-jw

Famous Member

We value your privacy