Split backupjobs for large-ish cluster

Apr 29, 2021
9
0
6
46
Hello.
I run a cluster with 7 nodes - at the moment 60 VMs and a few lxc's. Since we are moving from vmware, the number of vm's will grow. For now, the goal is about 200-250 vm's, but it will grow. I have 6 more hosts potentially joining.
I suffer from timeout from the PBS, occasionally (random vm:s all the time):

INFO: Starting Backup of VM 159 (qemu)
INFO: Backup started at 2024-04-03 21:13:50
INFO: status = running
INFO: VM Name: blabla
INFO: include disk 'scsi0' 'storage:159/vm-159-disk-0.qcow2' 100G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/159/2024-04-03T19:13:50Z'
ERROR: VM 159 qmp command 'backup' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 159 failed - VM 159 qmp command 'backup' failed - got timeout
INFO: Failed at 2024-04-03 21:17:30

I wonder if it's possible and/or a smart thing to split backup jobs, so node1 backs up at 21.00, node 2 @ 22.00 and so on? Possibly it hurts dirty-bitmap and dedup if vm:s are migrated from one host to another in that case? Any other drawbacks?
Is it possible to exclude vm:s if I go this route? VM:s will not be on the same host at all times, live migrations will take place. F.ex if I exclude VM106 on node 1, then i guess it won't be excluded if migrated to node 2...

Or should I just investigate why it's timing out...

The PBS is a 24 x Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2 Sockets) with 280GB RAM. The storage used for backup is a NFS mounted synology NAS.
PBS version: 3.1-2

Best regards
--
Markus
 
NFS will not perform properly for PBS, specially with such amount of VMs to backup [1]. Pretty sure that's why you are getting those timeouts. Sooner than later you should use local drives for PBS to get proper performance.

Meanwhile:

- You can create different backup jobs for each host and run them at different times. Live migration will not hurt neither dirty-map nor dedup, as long as you backup to the same storage/PBS datastore. Backup window will need to be bigger and you will have to manually estimate times properly.
- You can exclude VMs, but AFAIK you will have to exclude them on all hosts where that VM may run. If you don't use resource pools [2] for any other purpose, maybe you can create a backup job using a resource pool "VMs2backup" and for VMs that you don't want backups for, just remove the VM from that pool.


[1] https://pbs.proxmox.com/docs/system-requirements.html
[2] https://pve.proxmox.com/wiki/User_Management#pveum_resource_pools
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!