Proxmox / Ceph / Backups & Replica Policy

fstrankowski

Renowned Member
Nov 28, 2016
79
18
73
40
Hamburg
Hello everyone!

We've recently upgraded our backbone to 50G and are having some interesting findings in our (3 node) cluster . We're running on latest Proxmox 8.3 with Ceph 18.2.
Ceph VM-Pool is configured with 3x replication over all 3 nodes (so one copy resides on each node).

When we're running backups (both LXC and KVM), CEPH reads the VM-image from the blockdevice / placement group which has been set as primary. This primary group my reside in either the local server or one of the other two.

To prevent this behaviour from happening, we've now set
Code:
rbd_read_from_replica_policy
to
Code:
localize
. The default behaviour prefers the primary placement groups, the localize setting prefers the location closest to the server with the VM residing on.

For a 3-node 3x replication cluster this eliminates any network-usage while doing backups (reads are all done locally), on our bigger clusters (20-50 nodes) have a noticeably lower network usage while doing backups.

Question: Why is this setting set to default and not localize? @fabian (sorry for tagging you directly here but we're doing awesome playing ping-pong together) ;-)

Cheerio

Florian
 
the default one is probably better at distributing the load across disks, but I am not a ceph expert. @aaron ? ;)
 
Please file a feature request at https://bugzilla.proxmox.com, ideally with some numbers that you have seen in your cluster(s).
We can then think about either making this the default or easy to enable/disable from the Proxmox VE tooling.
 
  • Like
Reactions: fabian