Proxmox Ceph no rebalancing

boero · Jan 20, 2025

Hi everyone,
I have currently set up a Proxmox Hyper-Converged Ceph Cluster. VMs etc. are running as desired, so I am now testing crash scenarios.

Here I encounter the following problem: If a node fails (storage network interfaces, see the current setup below), no rebalancing takes place and the VMs on the affected node are no longer accessible and migration to another node is also not possible.

If all network interfaces (HA and Ceph) fail, the migration works and everything continues to run as desired. If a node is shut down, the migration also works and everything continues to run as desired.

I am currently testing the crash scenario with my node “pve03”. I have attached a few screenshots for an overview.

Does anyone have a solution?

The setup is as follows:
- 3 nodes with 2 OSDs each.
- Full Mesh Routed Setup (with Fallback), two 10 GbE network interfaces each for Ceph
- one 1 GbE network interface each for HA
- PGs for the Ceph pools are configured according to optimal settings as per the pool overview
- Ceph version: 18.2.4

aaron · Jan 20, 2025

That is a rather constructed scenario, where you intentionally removed both NICs. With one gone, everything would keep working. That is what that network setup can protect you from, but not if both are down.

From the POV of Ceph, the node is gone and therefore the OSDs are shown as down and, since it was long enough, as out. This is only a 3-node cluster, therefore Ceph cannot recover the data to another node, as the remaining two already have replicas. There can only be one replica per node.

From the POV of Proxmox VE, the node is still up and running; therefore HA won't recover the VMs to the other nodes. The VMs cannot longer read or write to their disk images on Ceph as the connection to it is completely down.

boero · Jan 20, 2025

aaron said:
That is a rather constructed scenario, where you intentionally removed both NICs. With one gone, everything would keep working. That is what that network setup can protect you from, but not if both are down.

Yes, it was done deliberately, for testing. Just right, if one NIC fails, everything continues to run smoothly.

aaron said:
From the POV of Ceph, the node is gone and therefore the OSDs are shown as down and, since it was long enough, as out. This is only a 3-node cluster, therefore Ceph cannot recover the data to another node, as the remaining two already have replicas. There can only be one replica per node.

Due to the replications on the other nodes, the VMs should still work, or am I misunderstanding?

aaron said:
From the POV of Proxmox VE, the node is still up and running; therefore HA won't recover the VMs to the other nodes. The VMs cannot longer read or write to their disk images on Ceph as the connection to it is completely down.

What is the best practice for this scenario?

Thanks for your reply!

aaron · Jan 20, 2025

boero said:
Due to the replications on the other nodes, the VMs should still work, or am I misunderstanding?

The VMs act as Ceph clients and need access to the Ceph Public network. With both NICs down, that access is most likely not given anymore, therefore from the POV of the guest OS, the disk is not reacting to any IO.

boero said:
What is the best practice for this scenario?

Well, the VMs cannot access their disk anymore. Therefore a hard stop will be needed. Once powered off, you can try to do an offline migration to one of the other nodes. If that fails, you can always manually move the VM configs, circumventing all safety checks:

Code:

mv /etc/pve/nodes/{source}/qemu-server/{vmid}.conf /etc/pve/nodes/{target}/qemu-server/

boero · Jan 20, 2025

aaron said:
The VMs act as Ceph clients and need access to the Ceph Public network. With both NICs down, that access is most likely not given anymore, therefore from the POV of the guest OS, the disk is not reacting to any IO.

Okay, so for more error tolerance I could use a third, separate NIC for the public_network? Currently public_network and cluster_network run via the same NIC / IP address.

aaron said:
Well, the VMs cannot access their disk anymore. Therefore a hard stop will be needed. Once powered off, you can try to do an offline migration to one of the other nodes. If that fails, you can always manually move the VM configs, circumventing all safety checks:

Code:

mv /etc/pve/nodes/{source}/qemu-server/{vmid}.conf /etc/pve/nodes/{target}/qemu-server/

I will try that!

aaron · Jan 20, 2025

boero said:
Okay, so for more error tolerance I could use a third, separate NIC for the public_network? Currently public_network and cluster_network run via the same NIC / IP address.

Will it be fast and redundant? Keep in mind, that the Ceph Public network is the main Ceph network. The Ceph Cluster network is optional and can be placed on a different physical network to take load away.
https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/

The Full-Mesh routed with fallback option helps agains the loss of one network cable in the mesh. What do you want to protect against? If you have both NICs on the same PCI card that could fail, then rather add/change one of the ports to a NIC on a different PCI card.

boero · Jan 20, 2025

aaron said:
Will it be fast and redundant? Keep in mind, that the Ceph Public network is the main Ceph network. The Ceph Cluster network is optional and can be placed on a different physical network to take load away.
https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/

Thanks for the information!

aaron said:
The Full-Mesh routed with fallback option helps agains the loss of one network cable in the mesh. What do you want to protect against? If you have both NICs on the same PCI card that could fail, then rather add/change one of the ports to a NIC on a different PCI card.

Makes sense, thank you!

When I stop the VMs and migrate them offline, it works perfectly fine. I have all the information I need. Thanks for your help!

Search

Search

Proxmox Ceph no rebalancing

boero

New Member

Attachments

aaron

Proxmox Staff Member

boero

New Member

aaron

Proxmox Staff Member

boero

New Member

aaron

Proxmox Staff Member

boero

New Member

We value your privacy