ZFS special device - how much speed increase on verify jobs?

Jul 26, 2021
59
14
13
55
Hi there,

some time ago I had to change our PBS backup server from spinnning disks to SSDs because the verify jobs on all the data stores needed too much time, so they overlaped and ended in a never ending queue. After running some tests with ZFS special devices, this could maybe a solution to go back to the spinning disks - because they are much cheaper.

My question: Has anybody experiences how much a special device on SSD in combination with spinning disks will increase performance of verify jobs?

Thanks & Best Regards
Martin
 
It gave me no peace, so I just set up a server with 2x2TB NVMe and 10x16TB HDD and figured out what happen with the spinning disks + special device. I ran a backup with approximately 1TB data and ran a verify job afterwards. These are the results:

Server with 6x4TB NVMe: speed 2018.49/4278.22 MiB/s -> duration for job: 5min
Server with the special device: speed 79.50/168.53 MiB/s -> duration for job: stopped with 40% after 30min

So that configuration seems not to significantly speed up a verify job and is to slow for my use case. My conclusion: As Proxmox itself pointed out, the PBS is designed for running with SSD/NVMe. If one has much time and fewer jobs on the PBS it could be OK, but all over no recommendation to use spinning drives. Also, it's maybe OK for PBS that has only sync jobs that copies already verified backups to local disk.

If someone has additional suggestions to speed up this scenario - please let me know.
 
Last edited:
  • Like
Reactions: kwinz
It speeds up everything, as the HDDs aren't any longer hit by a lot a small random IO, as the metadata doesn't have to be stored on the HDDs anymore. So it speeds up sync writes + async writes + reads. But you are right, that it most helps with GC tasks, which it speeds up by magnitudes.
But for verify/backup/restore I also got a factor 2-3 speed increase after adding the special device SSDs.
 
Can you reduce the frequency of the verify jobs? E.g. only run verify once a month?
Yes sure you can! From daily to "never".

The recommendations vary massively. As usual: it depends on your own expectation and requirements.

I for myself have re-verify scheduled in a six to eight weeks interval on my main PBS. To be prepared for a completely destroyed data store (or the complete PBS host's hardware) I have additional backups (just the simplier dump-files, no PBS) stored on a remote server in a different building.

"Noone wants Backups. Everbody wants Restore!"
 
Appreciate the reply! I was just asking because you mentioned you did the hardware upgrade to expensive NVMe because the verifications "needed too much time, so they overlaped". Not running them as often could also have solved the overlapping problem, right?

And secondly I noticed that you are also running ZFS!

I just read some post here on the forum that wants a new feature with lighter verification where PBS only checks that the metadata is ok but doesn't actually verify the that the chunk checksums still match (or the file contents if it's not encrypted). See this: https://forum.proxmox.com/threads/pbs-on-zfs-and-verification-job.121917/

I am wondering whether you do any zfs scubs additionally to the PBS verify jobs?
ZFS scrubs have some fine tunable knobs which you can try to use to make them run in the background.
Is that something that you have considered or where you can give me some advice if I want to do something similar?

[edit]: oh I just noticed you're not the original poster! My bad!
 
Last edited:
[edit]: oh I just noticed you're not the original poster! My bad!
No problem :)

ZFS scrub and PBS verify work basically on two different levels. While scrubbing will test if data being read from a pool produces the same checksum as before PBS does this on the application layer with the data-chunks.

In my understanding and from a user/admin standpoint on a pure PBS on ZFS both actions have the same goal: confirm that the data you read today is exactly the same you wrote yesterday.

I did not tune scrubbing, so no recommendation from me. I run it with the default frequency, which is once a month.
 
  • Like
Reactions: kwinz
For cost reasons, I'm using multiple targets of 18TB HDDs in hardware Raid1 mirrors for backups.
So each backup target has max. 18TB of space.
Backups select "ressource pool labels" assigned to each machine, so I can decide, to which target which machine is backed up.

Like this, multiple verifies (or backups) can run in parallel to multiple mirrored 18TB targets, balancing disk IO, as long as the controllers are fast enough.
I have to admit, that verify also takes 1-2 days over the weekend, depending on the disk usage.
According to web-statistics, verification read-speed is around average 150 to 200 M/sec. (about 250 max.).
For Raid5 SATA-SSDs on the same server average read-speed is between 200-300M/sec. (about 4 max.)

So my setup is much slower than "NVMe: speed 2018.49/4278.22 MiB/s" but also faster than "ZFS special device and HDDs 79.50/168.53 MiB/s" mentioned above.
  1. What is the advantage of using ZFS (consuming memory) instead of ext4 on a raid controller for backup space?
    PBS Dedup also seems to work on my ext4 drives, e.g. Deduplication Factor 36.15 is displayed.
    Speed seems not to be a reason for ZFS on HDDs....
  2. Would ZFS compression add significantly, as Backups are already compressed ?
 
> Would ZFS compression add significantly, as Backups are already compressed ?
No, it will not.

> What is the advantage of using ZFS (consuming memory) instead of ext4 on a raid controller for backup space?
Proxmox Backup server indeed has a lot of features that make some of the ZFS features redundant.

The biggest advantage of ZFS vs your current setup is that ZFS mirrors and RAID-Z are much more capable at correcting errors.
RAID-Z needs to find number of disks -1 good blocks per (128K) stripe. Stripes are also called parity groups. So for each individual stripe as long as if finds good data from some of its disks it can recover. It can mix and match. E.g. for one corrupt stripe it can take good data from haddisks A+B+C, but for another stripe where harddisk C has a corrupt data it can get good data from e.g. hard disk A+B+D.
Same for Mirrors. ZFS will recover a corrupt file as long as it finds a good block copy on any of its disks for each individual affected block.
Most vendor's modern enterprise HDDSs will claim at most one undetected, corrupt 4KB sector per 10^15 bytes bits read in their datasheets. There is big chance that your HDDs return bad data on any of the HDD's sectors, and that error will happily travel up your hardware RAID layer. But the chance that two HDDs will fail on the same 4KB sector and affect the same RAID-Z stripe is extremely small.
It's also convenient, because it will not just report the error to you, it will also recover from it.
Classic RAID keeps no per-block checksums, and works on a much bigger disk-based granularity. Proxmox Backup Server will still detect any errors in Verification, but IMHO will not automatically fix them for you. I don't want to play data forensics and by hand try to pick different blocks from different disks in hopes of getting back a file that passes verification on ext4 Datastore.
Without wanting to start a religious argument, hardware-based RAID is legacy technology that was invented before multicore CPUs were common and has very few legit use-cases on new systems.

Plus ext4 has other weird quirks that show its age, like a file size limit of only 16TB! That limit is also in effect if the file is actually sparse and only a few MB big on disk. Ext4 still can't handle it. Today that can already be too small for a single disk image file for example. I don't know if this is relevant for PBS, but I wouldn't use ext4 for mass storage for a Datapool.
 
Last edited:
  • Like
Reactions: Feni and UdoB
Ext + raid card is a really good solution, for Proxmox as well as for PBS. I would also recommend having a spare RAID card, and testing RAID degradation and rebuild procedures, just in case.
RAID+ ext4 is always faster than any HDD-based zfs raid, so this is a good solution
 
Poor me, being in the hope, that professional LSI Megaraid would read from both mirrored disks and alert in case of different data.....

My thoughts:
If you really want to kill SSDs, use them for your daily backups...
And to make things "better", use same batch disks, if one SSD fails, the others might also do.
And they do not fail with "write problems" but also "you cannot read anything anymore" fails.

Anyway:
PBS offers Backup Remotes, so at least HDD fail on first Backup-Server is not so severe. Good solution !!!

Admitted:
"...corrupt 4KB sector per 10^15 bytes read in their datasheets" is a problem which might be best solved using Raid5 or 6, as even disks with 10^16 or 10^17 read are not perfect but expensive.
Not shure, what Raid5 read accuracy is compared to ZFS.

Found some interesting (old) links regarding HDD read fails:
From my experience over the last 20 years, I never had problems with HDD corrupt read data if the drive was OK.
But maybe I just not detected them. Found some on SDcards but never on hdds....

So ZFS on SSD with ECC RAM might be the best solution.

Or use Tape drive as second destination.
 
Last edited:
Since this is a backup solution, usually use what you have, and then if you need better,upgrade to it. So if you have consumer ssds connected to raid card,use it.
 
> It speeds up everything, as the HDDs aren't any longer hit by a lot a small random IO,
> as the metadata doesn't have to be stored on the HDDs anymore

unfortunately , no. @Dunuin

i have set zfs_arc_meta_balance=50000 in zfs to favour metadata and don't see really noticeable improvement in verify performance.

the chunks of 4MB size are still "small" enough to cause lots of seek on disk and thus have a large impact on io read performance in pbs.
 
if you want go get a clue, why your verify performance is low, do a test like this:

tar cf - ./your-datastore | pv >/dev/null

on my array i get <=50MB/s
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!