Fleecing on Ceph RBD storage

mbosma · May 2, 2024

I tried finding some recommendations on using fleecing on Ceph storage but couldn't find an answer yet.

We're running a fresh PVE cluster and PBS server on versions 8.2 and 3.2.
This cluster is using Ceph as blockstorage with nvme disks on a 2x10G link using LACP.
The same link is being used for Proxmox Backup Server which is using spinning rust.

It feels like fleecing would still benefit the backups as we'll be writing to faster disks.
However since Ceph will generate more traffic because of the 3x replication saturating the 10G link.
As we are limited by the switches we're unable to upgrade to more bandwidth.

Is using fleecing still recommended given the constraints or on Ceph in general for that matter?

spirit · May 2, 2024

maybe can you create a pool without replicat , dedicated to fleecing ?

Personnally (I'm also using ceph), I'll look to add some local nvme disks in my nodes dedicated to fleecing.

bbgeek17 · May 2, 2024

I think @spirit is on point. Adding a dedicated local NVMe is the best approach. Why over-replicate temporary (fleecing) data, potentially over-loading the production access. Keep the fleecing data off the Ceph network.

Keep in mind that fleecing will not be used at all times, only when the backup server cannot keep up or becomes temporarily unavailable in the middle of the backup.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

alexskysilk · May 2, 2024

I've never heard of the term fleecing (at least not in context of storage not involving sales

) but it seems to me that there wouldn't really be any benefit for guests using ceph anyway; doesn't the backup stream from a frozen checkpoint?

bbgeek17 · May 2, 2024

alexskysilk said:
doesn't the backup stream from a frozen checkpoint?

The design of the current PVE/PBS backup is completely independent of the underlying storage (to the chagring of storage people, yours truly included). PVE does not rely on storage snapshots but uses QEMU snapshots.

In the case of QEMU snapshots, a filter is inserted between a VM and a disk. During backup, when a write comes to a block that has not been backed up, the write is held up until the original block is sent to PBS. The primary issue is when the PBS is slow or becomes unavailable, the effect is similar to disk failure from the application point of view.

Fleecing addresses this challenge by saving the non-backed-up block to storage elsewhere for the duration of the backup. Essentially, a reimplementation of a snapshot. Yes, being able to utilize storage snapshots/clones would be much more efficient - more in line with what Veaam is doing in the VMware world.

Keep in mind, that even though Proxmox is a major consumer of fleecing, the development is being driven by the QEMU project and is widely applicable.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

alexskysilk · May 2, 2024

bbgeek17 said:
PVE does not rely on storage snapshots but uses QEMU snapshots.

seems like I should have been aware of this... in our infrastructure we implemented our own backup mechanism that is directly connected to ceph and performs snap-copy-remove; at the time this was implemented vzdump was such an unfunny joke it wasnt even considered for in-production use. similar to Veeam, it has no appreciable effect on operational performance of the guests.

I had initially thought about dumping it in favor of pbs since it has built in differential functionality, but I guess I wont be.

mbosma · May 3, 2024

I'll look into adding a nvme disk as well @spirit.
The servers all have pm981 disks installed as they were repurposed vmware servers with them as ex-boot disks.
I'm not quite sure I'd like to use those because of the low write endurance. Would you recon the amount of writes will be low enough to use those @bbgeek17?

bbgeek17 · May 3, 2024

mbosma said:
I'm not quite sure I'd like to use those because of the low write endurance. Would you recon the amount of writes will be low enough to use those @bbgeek17?

Without knowing anything about your use-case, I'd be giving you a wild guess.
You can:
a) create a snapshot just before your normal backup window starts
b) look at the snapshot size when the backup normally ends
c) repeat for a week/month
d) average the results
That should give you a somewhat accurate idea about your rate of change at that exact time slot.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Search

Search

Fleecing on Ceph RBD storage

mbosma

Renowned Member

spirit

Distinguished Member

bbgeek17

Distinguished Member

alexskysilk

Distinguished Member

bbgeek17

Distinguished Member

alexskysilk

Distinguished Member

mbosma

Renowned Member

bbgeek17

Distinguished Member