Metadata cache on ZFS - How much does it help for PBS?

mgaudette112

Member
Dec 21, 2023
39
2
8
Hi,

Question about storage support for PBS : this is from the PBS web site, under recommended specs . (bold is mine for emphasis)
  • Backup storage:
    • Use only SSDs, for best results
    • If HDDs are used: Using a metadata cache is highly recommended, for example, add a ZFS special device mirror.

I have, on another server that needed to backup about 6TB of data, tried both HDD by themselves and SSD. HDD were ok but almost too slow to do daily backups (verify jobs took 12 hours). I figured I could shell out the money for SSDs, which I did.

I have another PBS to figure out, this time backing up about 20TB of data. I am wondering, if I was to use ZFS with special vdev, as "recommended" above, am I getting close-to-SSD performance or is it barely better than HDD?

I am trying to figure out where HDD+special vdev lies in the performance spectrum compared to SSD-only and HDD-only.
 
Hi,

Question about storage support for PBS : this is from the PBS web site, under recommended specs . (bold is mine for emphasis)


I have, on another server that needed to backup about 6TB of data, tried both HDD by themselves and SSD. HDD were ok but almost too slow to do daily backups (verify jobs took 12 hours). I figured I could shell out the money for SSDs, which I did.

I have another PBS to figure out, this time backing up about 20TB of data. I am wondering, if I was to use ZFS with special vdev, as "recommended" above, am I getting close-to-SSD performance or is it barely better than HDD?

I am trying to figure out where HDD+special vdev lies in the performance spectrum compared to SSD-only and HDD-only.
Hi,
the metadata cache will help improve the performance for operations where the PBS server only needs to access the file metadata such as the atime in case of garbage collection. For verify jobs, which need to read and hash the data stored with the chunk store, the metadata cache will help not that much, you will not get the same performace as for an ssd only storage for these operations.
 
Hi,
the metadata cache will help improve the performance for operations where the PBS server only needs to access the file metadata such as the atime in case of garbage collection. For verify jobs, which need to read and hash the data stored with the chunk store, the metadata cache will help not that much, you will not get the same performace as for an ssd only storage for these operations.

Thank you for the clarification
 
Is an SSD cache for HDDs still required?
Was it required at any time???

A separate Second-Level-Cache will hold some data for repeated read access. This does not really happen for PBS - as far as I understand the mechanism.

Please note that (First-Level-) ARC is present in any case. And metadata is stored there - in RAM. (Actually you may "zfs set primarycache=metadata mypbsdataset" to specifically allow only metadata.)

On the other hand an SD will help by reducing the needed physical movement of the HDDs drastically.

(( There are some other parameters available to optimize ARC usage, look at "grep dnode /proc/spl/kstat/zfs/arcstats" for one example I had to increase on my systems. ))
 
As I understand it, a metadata cache can improve GC speed while data can be read from an SSD.
A verify or restore process needs to read the data from an HDD, so there is no improvement from the SSD cache in that case.

GC was optimized in PBS 3.4, so perhaps GC speed is also okay on an HDD RAID 10?
 
GC was optimized, but will still benefit from fast metadata access, just like before.