PBS Tuning

DerDanilo · Jul 15, 2020

Just watchted the youtube video of the PBS tests to get a quick overview of PBS. Will test the beta soon as well, just no time currently.

This backup solution is already very good and a solution that many people waited for. It solves many issues and will save a lot of time if it works really well on the file level backups too. It can even possible replace certain other backup solution which are quiet expensive and the opposite of open source.

I am not sure that I understand the PBS filesystem entirely yet. Would it make sense to build in some flash disks into a HDD backup server to speed up certail file/chunk/index IO of the PBS stores? Yes, one could simply use ZFS cache disks, but this is not about caching. So, is there part of the PBS files structure that would greatly benefit from flash disks that allows way faster indexing/searching/restoring/... ? If so, could it possibly be build into PBS so that once is able to add e.g. a PBS DB store which uses flash disks?

This idea is based on the fact that there are file backup jobs which have a huge amount of files. Speaking about multiple million small files (500Mio+).
Still not sure if the PBS file client is even able to handle such huge amount of files (yet) properly. If it is able to handle this properly and with decend performance, it would be truely awesome.

It would also be nice if one would be able to set speed limits for the backup client via option. Of course even options like "use x many cores to run encryption" (like with pigz) or "limit RAM usage to x GB".

I think that it would also be really nice if there would be a module for e.g. Ansible that once could easily use to configure/run PBS client backups. Even if this is something that might come at a later point in time.

Looking forward to testing PBS soon and hope that this projects get positive feedback it deserves!

fabian · Jul 16, 2020

the main/first peformance bottle neck on spinning disks will likely be metadata access, as PBS stores the backup contents in chunks inside directories in the datastore:

Code:

$ find .. | head -n 30
/backuptest/.chunks/0000
000000118aff7351d24769359bf05f8ab8009c9d95aad15dfc4d8147ad558efe
0000041dbd7a24da6b8883a0b37b8c0797577903b84566fa4af3344ac0d75ec4
000007dbaf9039e032b6f1b3086e1a8af1d9734960d2d3b11bce542abe6ed17d
00000fb5ea5750803bfaa28d8c0c6406eb69afefee4e8fc855daab0e70ae4424
000021e2ec7741f74f104cb88e901ce16293277be2ea81c37d53996fde72e72f
/backuptest/.chunks/0001
000103dfa48b1553a8e97db0f3bbd1168b7245ee2757911c2f210cea8a352f2e
00010fb7c52230479ec40b4589ed2730b600ab25f8ea8a12e59737562d3e625d
0001130002b69693ab79b0e86a87e87e20d9705b662fccea60fc30f357e3c6d9
00011b45212c45d51e533f96f726f70c72b2535033e6e7917b7471583e57b168
0001258ffff42c7a8847073549b301058bc30748b603b761a746a0a77573b847
/backuptest/.chunks/0002
000200a72cd58ae1741d6c455ba081a52046ff3fc67caa85431bfb3a7b70af50
000201b8b7cdc74860fc1436e9c987d558cc82df94c729303b8e4f85fbfaa012
0002110340e8bbc002e1e649d8bf5ac7a68b1106c1a7e72d6a8cacd8e1350629
00021b0fe92fe0092c592c9d5ca75bca8ab25fa126465813b99958b0704327da
00021c7d2254ab1ef9d0585089be194e35d1826d4396a87c0a39837c00663173
/backuptest/.chunks/0003
000300c26a043870ba2f64c1bdcf4e4cb10532fa952fca805ec189b4a4ae7bd2
000307ea6b677b46e6cda67c93b4e9036cd562053c66dc8baca77d7e87a2eedf
000327b97088160efbeb170ccab1ba936665bc6f46f8518c9287892e830b8202
00033a8b98019e56ec9f61540248524091a89803175a5fe7672417b69e1347d4
00033b22b926f7b40931b0e9cefbce8c8063ad4e41c09077ff39ea34e243b44f

depending on the amount of data and chosen chunking parameters, you'll quickly end up with lots of chunks. PBS tries to be smart and minimize costly operations (e.g. currently only the logically used space is displayed, and not the actually consumed one per backup/backup group as the latter is hard to keep track of efficiently).

when using ZFS, adding a (fast

) mirrored special vdev for storing metadata definitely speeds up operations that operate on a whole datastore like garbage collection. having enough RAM to keep metadata accesses from even hitting the disk is also a good idea.

container/host backups will have a bit more chunks per data, as the chunker works in a dynamic fashion and not with a fixed chunk size. GC needs to scan all chunks to determine whether they can be deleted. verify needs to scan all chunks and actually read & checksum the data within. pruning is cheap, as it just logically removes them, the actual chunk cleanup is deferred to the garbage collection.

DerDanilo · Jul 17, 2020

So there is (currently) no way to generally optimize PBS storage unless the storage backend supports it?

Could this possibly be build in somehow, so that PBS maintains some sort of index/cache on flash to speed up things? I am thinking of e.g. a Ceph based backend, RBD , since cephfs does not handle trillion of files well and efficiently at the moment and maybe never.
So having this could be quit important to not loose performance with growing storage backends. Yes, one could simply use multiple RBD images and integrate them into PBS, but RBD images can be extremely huge, so this is actually unnecessary.

Maybe someone has another idea how to optimize PBS.

tom · Jul 17, 2020

DerDanilo said:
So there is (currently) no way to generally optimize PBS storage unless the storage backend supports it?

Just follow the suggestions. You need a fast, reliable storage with the needed size.

We recommend ZFS with a fast mirrored special vdev, like Fabian wrote already.
See https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_special_device

Search

Search

PBS Tuning

DerDanilo

Famous Member

fabian

Proxmox Staff Member

DerDanilo

Famous Member

tom

Proxmox Staff Member

We value your privacy