Fleecing on Ceph RBD storage

mbosma

Renowned Member
Dec 3, 2018
124
25
68
30
I tried finding some recommendations on using fleecing on Ceph storage but couldn't find an answer yet.

We're running a fresh PVE cluster and PBS server on versions 8.2 and 3.2.
This cluster is using Ceph as blockstorage with nvme disks on a 2x10G link using LACP.
The same link is being used for Proxmox Backup Server which is using spinning rust.

It feels like fleecing would still benefit the backups as we'll be writing to faster disks.
However since Ceph will generate more traffic because of the 3x replication saturating the 10G link.
As we are limited by the switches we're unable to upgrade to more bandwidth.

Is using fleecing still recommended given the constraints or on Ceph in general for that matter?
 
I think @spirit is on point. Adding a dedicated local NVMe is the best approach. Why over-replicate temporary (fleecing) data, potentially over-loading the production access. Keep the fleecing data off the Ceph network.

Keep in mind that fleecing will not be used at all times, only when the backup server cannot keep up or becomes temporarily unavailable in the middle of the backup.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I've never heard of the term fleecing (at least not in context of storage not involving sales :p) but it seems to me that there wouldn't really be any benefit for guests using ceph anyway; doesn't the backup stream from a frozen checkpoint?
 
doesn't the backup stream from a frozen checkpoint?
The design of the current PVE/PBS backup is completely independent of the underlying storage (to the chagring of storage people, yours truly included). PVE does not rely on storage snapshots but uses QEMU snapshots.

In the case of QEMU snapshots, a filter is inserted between a VM and a disk. During backup, when a write comes to a block that has not been backed up, the write is held up until the original block is sent to PBS. The primary issue is when the PBS is slow or becomes unavailable, the effect is similar to disk failure from the application point of view.

Fleecing addresses this challenge by saving the non-backed-up block to storage elsewhere for the duration of the backup. Essentially, a reimplementation of a snapshot. Yes, being able to utilize storage snapshots/clones would be much more efficient - more in line with what Veaam is doing in the VMware world.

Keep in mind, that even though Proxmox is a major consumer of fleecing, the development is being driven by the QEMU project and is widely applicable.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
  • Like
Reactions: carles89
PVE does not rely on storage snapshots but uses QEMU snapshots.
seems like I should have been aware of this... in our infrastructure we implemented our own backup mechanism that is directly connected to ceph and performs snap-copy-remove; at the time this was implemented vzdump was such an unfunny joke it wasnt even considered for in-production use. similar to Veeam, it has no appreciable effect on operational performance of the guests.

I had initially thought about dumping it in favor of pbs since it has built in differential functionality, but I guess I wont be.
 
I'll look into adding a nvme disk as well @spirit.
The servers all have pm981 disks installed as they were repurposed vmware servers with them as ex-boot disks.
I'm not quite sure I'd like to use those because of the low write endurance. Would you recon the amount of writes will be low enough to use those @bbgeek17?
 
I'm not quite sure I'd like to use those because of the low write endurance. Would you recon the amount of writes will be low enough to use those @bbgeek17?
Without knowing anything about your use-case, I'd be giving you a wild guess.
You can:
a) create a snapshot just before your normal backup window starts
b) look at the snapshot size when the backup normally ends
c) repeat for a week/month
d) average the results
That should give you a somewhat accurate idea about your rate of change at that exact time slot.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!