Sequential backup for VMs/CTs

Matthieu Le Corre

Renowned Member
Apr 18, 2016
33
2
73
Nantes - France
We are trying to backup our whole cluster to a PBS server.

This was working quite well, but since last 2 months, some backup started to failed ( qemu guest agent timeout, qmp command timeout ....),
It seems that for an known reason, the storage below the PBS datastore that is not fast enough (shared with other backup system).

But the real problem is that PVE is starting to backup all the nodes ( 24 nodes) at the same time !
This is too much for our system, and some backup fails.

Is there a way to limit the number of backup lauched at the same time ?
Maybe an option like for the bulk migration ?

We can of course split backup in 24 parts, but this is far from optimal.
Vzdump hook script can maybe play the waiting room role, but this is ones again not that nice ...
 
there is no built-in cross-node coordination for backup jobs. you can of course manually tune your start times to distribute the load, or implement bwlimits.. in the end, it sounds like the root cause is your backup system isn't powerful enough though?
 
Some kind of queue or lock could be great. I have differents customers with same problem, when you have big clusters with a lot nodes and a lot of vms, and big backup. You need fast network too for the backup server, if you have 24 nodes sending backup at the same time.
It really difficult to plan different schedules for specific vms/node.

https://bugzilla.proxmox.com/show_bug.cgi?id=3086
 
yeah, I maybe should have added a 'yet' to my post above ;) it should be possible to adapt the protocol to have server-side "backup slots" and queue jobs once all are full, and that would be an interesting feature indeed.
 
I can't agree more about a server-side solution, PBS is really great but it miss a scheduling system for backups.

For now I splitted the job in 24 parts and it seems that issue is gone !
However, jobs length was around 2 hours, and now it take 8 hours as there are a lot's of idle time between different nodes jobs.
 
has any progress been made towards this ?

We are saturating a 20Gbit link with backups from the cluster and need to limit the number of concurrent nodes.
 
  • Like
Reactions: Pca
Hi,

Any news about this ?
Because if you have some cluster is hard to do a backup is running all backup in all nodes in same time for create global backup policy is not easy...
Actually have only found a workaround create job node by node & node after node.
I hope you can add possibility to set limit for backup all nodes with 1 job without saturating network.

PS : Sorry to bother you @fabian do you think I have to open a ticket through the pbs subscription to move the subject forward or is it not worth it?

Best Regards
 
Last edited:
  • Like
Reactions: Matthieu Le Corre
Hello,
I haven't seen any such addition with PBS 2.3 and from what I've tested the behavior is the same.
This involves significant issues on the backup side as soon as we have a cluster or several clusters.
The subject is more and more sensitive on my side.
If you have any information on the progress of the subject?
PS: I have already consulted ticket 3086 (no news).

Best Regards
 
there hasn't been any progress, else the bug tracker would have been updated ;)
 
News on the subject?

We always suffer from this. The lack of queuing for one backup at a time between all nodes in the cluster is long awaited...
 
Same for me, It seems the (my) pbs server cannot cope with to many incoming streams from different servers.

Somehow a backup scheme per server that starts on the next server when the previous server finished the job.

I tried labeling every node by hand and put them in its pool=node name didn't work out as VMs will be moved around. It also makes the pooling mechanism useless.

EG. "maximum number of parallel nodes" in the backup definition screen would help.

Alternatively: Maybe someone could create a script that lists the VMs and CTs per node and then creates a backup script for backing up the ones on node 1 first, then node 2, then node 3 etc. I think this should be possible as there is script

I also have multiple backup locations (2 Synology NFS NAS, 1 PBS), so having a round robin scheme to use each storage location after another would help me keeping the number of backup definitions to a minimum.
 
Alternatively: Maybe someone could create a script that lists the VMs and CTs per node and then creates a backup script for backing up the ones on node 1 first, then node 2, then node 3 etc. I think this should be possible as there is script
you can do a stupid bash script doing:

ssh root@server1 'vzdump --all ...'
ssh root@server2 'vzdump --all ...'
ssh root@server3 'vzdump --all ...'
 
you can do a stupid bash script doing:

ssh root@server1 'vzdump --all ...'
ssh root@server2 'vzdump --all ...'
ssh root@server3 'vzdump --all ...'
Of course, stupid me. Will make a couple of cron jobs. Thanks

Nevertheless better support for a proper backup strategy in the proxmox and/or PBS UI would be beneficial for many users. eg. a way to create the famous 3-2-1 type of backups.
 
Last edited: