Tuxis launches free Proxmox Backup Server BETA service

Currently, we run on a Proxmox VPS with the following specs:
4x KVM vCore (E52620v3@2.4 Ghz underneath)
16GB RAM
Zpool pbs:
- 2TB Ceph RBD disk on dadup (Spinning disks, 25km away)
- 20GB Ceph RBD disk on SSD as 'special device'
- 20GB Ceph RBD disk on SSD as 'log device' (unused due to the way PBS issues writes)
- 100GB Ceph RBD disk on SSD as 'cache device' (mostly useless due to the fact that 1TB of data is present which is only periodically read once, so next time you need it it has been pushed out)
hi, i have a question:

why do you use zfs on top of ceph?
my guess is that in this case, zfs will only hurt your performance and you do not gain anything

for compression/dedup: we already include compression and deduplication
for checksums: that is already handled by ceph

so the only thing thats left is the 'special' device and 'log device' though i would argue that in this case
giving the vm more ram (for the page cache) and using something like ext4/xfs will get you more performance than zfs on top of ceph
(note: i did not do any benchmarks, that are just my thoughts based on my experience with ceph/zfs)

did you do any benchmarking without zfs?
 
I like ZFS for it's simplicity in growing, the possibility of fast disks in front of a slow disk, the filesystem per user, including quota, compression (although not very useful in this case indeed).

Also, the way PBS is built, I don't think anyone will be happy of you have a lot of users/backups with all those files in a ext4 filesystem. Even though Ceph does do checksumming, it does not checksum the filesystem. ZFS does checksumming, and still PBS runs verify's, to check the checksummed chunks.

So ZFS is very easy to run with a lot of different datastores/users, that is the main reason. It also scales better than ext4/xfs.
 
  • Like
Reactions: guletz
I like ZFS for it's simplicity in growing, the possibility of fast disks in front of a slow disk, the filesystem per user, including quota, compression (although not very useful in this case indeed).
i agree that zfs is very nice, thats the reason we recommend using it, but with local hardware

Also, the way PBS is built, I don't think anyone will be happy of you have a lot of users/backups with all those files in a ext4 filesystem.
why not?

Even though Ceph does do checksumming, it does not checksum the filesystem. ZFS does checksumming, and still PBS runs verify's, to check the checksummed chunks.
i do not see the point in having checksums on filesystem level. you now have 3 checksums (that all have to be calculated)
block (ceph)
fs (zfs)
file (pbs)

zfs checksumming is only advantageous if you have a zpool where the files with wrong checksums can be healed by the redundancy
otherwise the pbs file checksums are enough to detect something like bitrot and cephs checksums take care of keeping it consistent

So ZFS is very easy to run with a lot of different datastores/users, that is the main reason. It also scales better than ext4/xfs.
in such a case i'd use lvm + 1 lv per datastore + ext4 (maybe xfs?)

i did throw a mini benchmark together:

"toy" ceph cluster with 4 virtual nodes (1 osd per node on nvme)
(it will not be fast but the relative difference is interesting)

pbs is a vm with 4 cores, 16 gb ram

1st datastore is zfs on a ceph disk without compression
2st datastore is plain ext4 on a ceph disk with no further tuning

i backed up a random vm with a 30gb disk and a ~28GiB directory (like a container)

~30GiB VM~28GiB Directory
ZFS on Ceph~60MiB/s~30MiB/s
Ext4 on Ceph~220MiB/s~120MiB/s

so it seems that in such a setup ext4 is much faster than zfs (although the absolute values are irrelevant, the relative difference is important)

my general point is that using a feature-rich storage/fs such as ceph/zfs/qcow2/etc. has a performance penalty and they
should not be stacked (e.g. zfs on qcow2 (or reverse) is also *very* slow) especially if one has already all the features of the other
 
Testing LVM instead of ZFS is a good tip! If I switch the current datastores to LVM/EXT4, would an rsync be sufficient?
 
would an rsync be sufficient?
should be (just make sure you sync the '.chunks' folder), alternatively, you can add your local ip (or localhost) as a remote and use a sync job
 
did it work? how does the performance look?
 
It's still syncing...

A few new users on the lvm, but they're not really busy yet.

It does seem LVM/ext3 is less efficient in terms of usage, so I need to expand the thin-pool later this morning..
 
Hi,

I thought ZFS was mandatory in order to use compression and dedup...
Seems not. :p I misunderstood something.
 
Hi,

I thought ZFS was mandatory in order to use compression and dedup...
Seems not. :p I misunderstood something.
No. :)

The cool (and risky) thing of PBS is the Proxmox' choice for chunks. This make dedup and compression pretty easy and also makes sure that you don't need to rely on any filesystem. (We are also planning on testing towards CephFS directly, e.g.)
 
So, PBS datastore over EXT4 instead ZFS should be more fast I guess.
Then I don't know why recommend ZFS for local storage, maybe via syncjobs ?
 
ZFS on local storage will probably perform comparable to LVM/ext4. I think syncjobs is just http over port 8007, so no zfs-requirements there, I think?

It would be cool if LVM management would be included in PBS as well.
 
So, PBS datastore over EXT4 instead ZFS should be more fast I guess.
Then I don't know why recommend ZFS for local storage, maybe via syncjobs ?

the main reason is that ZFS makes it easy to do reliable software raid with cache/metadata tiering via special/log/cache devices. so you can put a big array of spinning disks and a couple of fast NVME devices to good use :)
 
Yes, very cool. Please note though that the log-device is mostly unused due to the way PBS writes. Caching for these amounts of data that is rarely read is not very useful. By the time that you need the data again, some other data has probably pushed the old data out of the cache.

The special device is very useful! But, when you have the special device on SSD/NVME, you don't need to cache metadata anymore.

Please correct me when I'm wrong!
 
cache devices might make sense if you have a lot of backups and run a primary/off-site setup with sync jobs. a few hundred GB of cache might mean that the new chunks from today are still available there for syncing at night, for example. but special vdevs for sure have the most effect, since PBS does a lot of metadata access.
 
did it work? how does the performance look?
So, syncing is done. Somehow (never seen this in my life) with rsync not all files properly synced. Which wasn't intentional, but might serve as a nice test to see how PBS handles failed verification...

Performance seems better, but there are still two datastores syncing from ZFS which might slow things down a bit.
 
but might serve as a nice test to see how PBS handles failed verification...
Absolutely horrible. Unreadable errors on client side

Code:
command 'lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup '--crypt-mode=encrypt' '--keyfd=11' pct.conf:/var/tmp/vzdumptmp20837/etc/vzdump/pct.conf fw.conf:/var/tmp/vzdumptmp20837/etc/vzdump/pct.fw root.pxar:/var/tmp/vzdumptmp20837 --include-dev /var/tmp/vzdumptmp20837/. --skip-lost-and-found --backup-type ct --backup-id 112 --backup-time 1602289095 --repository DB0623@pbs@pbs.tuxis.nl:DB0623_proxmox' failed: exit code 255

Code:
102: 2020-10-10 02:15:42 INFO: Error: parse_rfc_3339 failed - wrong length at line 1 column 60

On the server side however:
Code:
2020-10-10T09:57:45+02:00: can't verify chunk, load failed - store 'DB0623_proxmox', unable to load chunk '9c8f3533157797a2b1212118b5fcd53a3ac047bb15769121c49724a4894e76b3' - No such file or directory (os error 2)

chrome_2020-10-10_09-58-32.png

--------

Interestingly, I am unable to perform any further backups right now even after removing all old ones, they just fail with
Code:
102: 2020-10-10 02:15:42 INFO: Error: parse_rfc_3339 failed - wrong length at line 1 column 60
while not giving any useful output on PBS side (not even task log; the only task log I can see are for aborted VM backups, the CTs that failed with parse_rfc_3339 seem invisible to PBS)

And the tasks are named as 1970-01-01.

--------

Aha! It's a client that destroys backups.

Downgrading to proxmox-backup-client=0.8.16-1 (from 0.9.0-2) fixes the issue with creating new backups. This is absolutely horrible, a single client update has destroyed all backup chains and made impossible to create new ones. What's the deal with it?

For what it's worth, client reports
server version: 0.9.0
 
Last edited:
@eider your keyfile contains timestamps in a wrongly serialized format that the new client does not understand.. could you post the 'created' and 'modified' entries of your keyfile (for PVE's autogenerated one, that is '/etc/pve/priv/storage/STORAGEID.enc')? please don't post the rest of that file as it contains the encryption key! this not related to anything server side, and does not affect the status of existing backups in any way (the client just cannot interact with them until you fix the timestamp).

the server side error needs a closer look, IIRC missing chunks should be handled as error and cause the verification to proceed, but fail in the end (we want to try to verify as much as possible!). could you post the full log?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!