Scale out PBS / Possibility to "Cluster" Proxmox Backup Server

crz

Member
Sep 18, 2021
16
5
23
Germany
Hello everyone,
as far as i can tell, there is no good way to scale out your PBS instances, except from getting multiple servers and doing different backupstorages/backupjobs for each machine and manually assign your backups to different PBSs.

What I propose is the following:

You create a "cluster" of multiple Proxmox Backup Servers, in which one acts as the "Master" and is the target for all Backupjobs.
The "Cluster" part is essentially just for managing, no backup data needs to go through this.
It then handles either storing the Backup on itself, or telling the PVE, "hey, please talk to the other PBS on IP so and so".
The logic behind that should check where is that namespace and then relay that information, or if it is on itself, handle the backup.
From the PVE side there is only one Proxmox Backup Server added as a storage.


Additional features needed on the PBS side:
- If you are creating a new namespace, the default should be the node with the most free space, but there should also be a dropdown so you can select your node manually.
- Move Namespace to new node. (Maybe in the background use sync jobs, and delete on source after verify that the sync is complete)
- Handing over the connection info to the PVE if it needs to talk to a different node. (So basically the master should not act as a proxy, because then the network connection from one node becomes a bottleneck, but rather just hand of the connection of the desired node, and the PVE then talks directly to the other PBS)

Additional features needed on the PVE side:
- it would be nice, if i can create a new namespace from the PVE side, and then assign VMs to it, which then in turn gets backed up as port of the backupjob)



Why do we need this?

We have multiple Clusters, a few hundred single Nodes, and a few bigger (10+ Nodes) Clusters coming.
From those devices we want to backup the VMs of our Customers, and it becomes a pain to handle multiple, even dozens of PBS nodes manually, and keep track of on what PBS is what customer and so on.
Also we want to use namespaces, so that we can enforce their contractually agreed upon quota.
The scaling of your backup storage becomes much easier, without the hassle of network storage (for example: creating a ceph cluster, just for backups), and with the benefits of local zfs storage on each PBS node.


I'm sure there are a lot of things i missed, and that there are hurdles to think through, but maybe we can do it together, and refine the proposal.
If you are in favor of such a thing, please let everyone know, so the devs can see if this is worth working on.
If you have anything to add/correct or improve, please share your thoughts.


Best regards,
Chris
 
PBS is primarily for backing up VM's, exactly how much VM storage do you have? A single server is quite capable of hosting at least 10 disk shelves no problem with redundant individual connections to each shelf. So using bog standard 24x2.5" SAS shelves, stuffed full of 2.4TB 10k spinners that's 500TB usable backup space that can be arranged if you know what you are doing to be completely shelf level redundant. If you really know what you are doing you can make it SAS card and even PCIe riser redundant.

If you switch to SAS based SSD's then 15.6TB drives are readily available that would give you 3.5PB of PBS backup storage.