High Volume VM Backup

deepcloud

Member
Feb 12, 2021
67
11
13
India
deepcloud.in
Hi,

Congrats on a great release, PBS 2.1 rocks.

We have a customer who is planning 500-1000 VMs using many hosts on CEPH using NVMe Enterprise drives. Now, if the storage capacity is going to be in excess of 200-300 TB of NVMe, a daily incremental backup could be very time-consuming using HDD as the PBS Storage (either of ZFS or CEPH across many nodes). This is a valid concern.

I am thinking of putting in NVMe SSD for backup too to mitigate this issue. But the customer can't keep buying NVMe for backup for long retention (7 daily, 4 weekly, 12 monthly, 3 yearly). So here is my question:

Can we have a 2 - stage backup. where the long retention policy is maintained on the HDD, but we keep an intermediary NVMe based PBS backup server.

Basically, the NVMe PBS backup server takes the backup of the Proxmox VE instances. and this backup is shifted to the HDD based backup server in the background.

This way, the NVMe PBS backup server has the last (yesterday's) days backup and the HDD based backup server the remaining backups as per the long term retention policy.

Regards

DCorp
 
Short answer: Yes.

You can have a primary PBS instance with high performance drives but relatively few backups and short retain per VM, and an additional (or several additional) secondary PBS instances with larger storage and more backups and longer retain. I would actually recommend having 2 or 3 PBS instances in different locations.

In my experience the sync operation that pulls data from a primary PBS instance to a secondary one requires less IOPS and is less dependent on network latency than the primary backup creation process. At least I have no issues with pulling data from the primary instance to remote locations however that depends on the amount of new data per day of course.

I have also never seen an excess of 5000 IOPS during any single PBS operation, so NVME drives may be too performant if you also consider the price. I have been using SATA SSD:s to great effect for both primary and secondary PBS instances.

With that large an installation perhaps using Ceph would be interesting on your secondary (large/long) instance? Ceph has great read performance (for a network distributed system) so the verifiction process shouldn't have any issues with throughput.
 
yes, using sync and tuned retention policies on both ends (less retention on the fast, small primary datastore/PBS host, more retention on the slower, big archival datastore/PBS). sync is a lot more forgiving than the actual backup - if interrupted it can pick up where it left off unless a GC has run in the meantime, and of course, it's also not in the regular I/O path of your guest with all the stuff that might be going on at the same time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!