Feature Request: Backup - Advanced VM selection

hellfire

Renowned Member
Aug 17, 2016
77
41
83
46
Hi,

I'd like to share a feature request and discuss it here. If this leads to the opinion that this is a valid feature request, I'll create an enhancement-request in the bug tracker.

Use Case:
  • I have a Proxmox VE Cluster here (Ceph-Storage)
  • The Virtual machines (qemu) are dynamically assigned and sometimes moved around
  • The Backup is done via PBS.
  • The network for backup is separated from the public network for the service and heavily used for this cluster and other systems in the night time for backup purposes. It's a bottleneck.
  • Because of the prior aspect, the backup for each node in the cluster is split into n backup jobs, where n is the number of nodes. The backup jobs are scheduled to run only one at a time.
  • There are some virtual machines that should definitely not be backed up by proxmox. So simply selecting "all vms to be backed up" is not an option. I must rather use "include only selected vms" or "all except selected vms"
Problems:
  • Problem 1: If the vms are moved around and are located at different nodes the used Backup-Selection Method (backup all with exclusion) is no longer appropriate. (I can not exclude virtual machine ids which are currently not located on the node)
  • Problem 2: There are different administrators and I'd like to have the backup to be always in correct configuration even if one administrator forgets to reconfigure the backup job to include the new VM(s).
Possible Solutions:
  • Solution 1: Be able to select all virtual machines existing on the entire cluster even if a specific node is selected and the greater part of virtual machines are not running on that node at the moment. That could be done by a checkbox "show virtual machines of other nodes too". This maybe a selectable option in "advanced" view mode. At backup-job-run-time non-existent VM-IDs will simply be skipped (I think this should already be the case). This option is more simple and easy to use.
  • Solution 2: Be able to select virtual machines for inclusion/exclusion based on a Extended Regex. This way a flexible exclusion/inclusion mechanism should satisfy all needs. This option is most flexible and may even consider VMs currently outside completely of the cluster.
  • Solution 3: Use Tags (one tag or a combination of more tags) for Backups.
  • Solution 4: Use advanced filtering for VM selection (Tag == #tagname AND|OR Regex of vmname is *.ldap.domain.com AND|OR nodename is "node1" ... )
Current workaround:
  • Edit jobs.cfg manually or by script and put in a line "exclude <#vmid1> ... <#vmidN>" on every needed backup job. (Tested. Works fine.). The backup job can no longer be changed within the web ui if I do that without loosing the vm selection.
What do you think?
 
Last edited:
Problem 1: If the vms are moved around and are located at different nodes the used Backup-Selection Method (backup all with exclusion) is no longer appropriate. (I can not exclude virtual machine ids which are currently not located on the node)
What do you mean by "moved around"? Normally the VM 123 is backuped nonetheless where it is in the cluster. backup is normally NOT node specific. Isn't that the whole point of a vm running in a cluster?
 
@LnxBil

As I wrote, Backup is configured explicitly node-specific, because of the behaviour of Proxmox VE to backup nodes in parallel, which I do not want. In want it to be done strict sequentially, because the load parallel backup imposes on the network, which is too much for me. That parallel/sequential thing is already a proposed feature request.
 
Last edited:
@LnxBil

As I wrote, Backup is configured explicitly node-specific, because of the behaviour of Proxmox VE to backup nodes in parallel, which I do not want. In want it sequentially, because the load of parallel backup imposed on the network is too much for me. That parallel/sequential thing is already a proposed feature request.
Okay, I understand it now. Some other ideas:
  • why don't you limit the backup throughput so that it can be parallel
  • buy better (or more) network equipment
  • switch to lacp-bonding with multiple overlay vlan networks. We're running 4x25 GBit in LACP
  • in general QoS?
 
  • Like
Reactions: hellfire
why don't you limit the backup throughput so that it can be parallel

I consider this a slightly less efficient way to do the same. (double jobs at half speed seems worse than half jobs at full speed)

  • buy better (or more) network equipment
  • switch to lacp-bonding with multiple overlay vlan networks. We're running 4x25 GBit in LACP

currently no/little budget (time/money).

  • in general QoS?

Uuuh I really like to avoid that complexity. ;-)

But nevertheless, thanks for the effort to write your thoughts!
 
As I wrote, Backup is configured explicitly node-specific, because of the behaviour of Proxmox VE to backup nodes in parallel, which I do not want.

I consider this a slightly less efficient way to do the same. (double jobs at half speed seems worse than half jobs at full speed)
Do you have numbers to back this up? even if true, do the backups complete within their backup window? Its certainly possible that there are more concrete reasons to avoid parallel backups, but a vague "efficiency" doesnt seem like a valid one.

You have a finite window "X" to complete a backup, which consists of "Y" VMs with "Z" data. the "problem" is how to make sure that X*Y fits in Z.

1. What difference does it make how many nodes you have, or how many of them are operating in parallel? the bottleneck is the destination storage, not the source. This isnt likely to be a big problem with PBS anyway since its a differential backup.
2. Are you able to complete your backups within your specified window today? how long does it take?
3. Left to its own devices, how long does the backup complete normally?
 
I consider this a slightly less efficient way to do the same. (double jobs at half speed seems worse than half jobs at full speed)
Do you have numbers to back this up?

Nope. Just vague thought on efficiency.

1. What difference does it make how many nodes you have, or how many of them are operating in parallel? the bottleneck is the destination storage, not the source. This isnt likely to be a big problem with PBS anyway since its a differential backup.

Now having changed to PBS - even in a very inefficent way (NFS-Datastore) and incremental backups, times are no longer problematic at all.

2. Are you able to complete your backups within your specified window today? how long does it take?

1-3 hours with incremental backups - estimated. Incremental backup is running with initial backups all done only since today.

3. Left to its own devices, how long does the backup complete normally?

Before PBS it took about 9 hours for normal full backups to NFS-Storage.

"All" can be used then exclude vDisks of each VM wich doesn't need PVE backup.

Ah! Great! Did not know that! That's a very good option!
 
Last edited:
@LnxBil

As I wrote, Backup is configured explicitly node-specific, because of the behaviour of Proxmox VE to backup nodes in parallel, which I do not want. In want it to be done strict sequentially, because the load parallel backup imposes on the network, which is too much for me. That parallel/sequential thing is already a proposed feature request.
see :

https://bugzilla.proxmox.com/show_bug.cgi?id=3086
 
"All" can be used then exclude vDisks of each VM wich doesn't need PVE backup.
Be careful if eventual restore inplace is done, current data of the vDisks can be deleted. iirc, no option is yet available on restore inplace to keep existing vDisks.
 
Think about the request from the devs perspective.

You have a request for a feature that doesn't appear to have a stated benefit except for a "vague thought on efficiency." what priority would you place on such a request?
 
Now having changed to PBS - even in a very inefficent way (NFS-Datastore) and incremental backups, times are no longer problematic at all.
hmm. so the storage you're using is remote from pbs? you now have a compound bottleneck- the available bandwidth between the nodes and pbs, and available bandwidth between pbs and your storage. regardless of whether you send a single stream or multiple streams, the job will be limited to the available bandwidth of the slowest/busiest link.

Moreover, when using PBS, traffic is one consideration. inspection, deduplication, and compression is another. you'd get better throughput is you let PBS handle multiple streams.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!