Been using PBS on several installations without any issues so far. On one of my PBS installations I was having issues with spinning disks (using ZFS) so I went on to add a special device to the pool in order to speedup metadata processing (GC is metadata heavy).
To describe this better, this particular machine has PBS installed alongside with PVE. Storage is ZFS on 4 SAS spinning drives in RAID-10. This is the storage:
And this is the ZFS dataset that I keep my PBS storage on:
This ZFS pool didn't have a special device until last week. I added two Intel DC series SSDs and added a special device as a mirror.
Then I went on to do... well, I used a script to "rebalance" the contents of PBS dataset in order to rewrite them to storage so that metadata finished on SSD-s. I made sure proxmox-backup.service, proxmox-backup-proxy.service and proxmox-backup-daily-update.service were stopped before I started the script. I should have probably stopped cron as well, but didn't. As far as I could tell, all went fine, although it took a VERY long time. My SSD-s were filled with metadata, zpool status was good. They I started those services back.
Backups were all there, verifying was going fine - all looked good. I made NO CHANGES to backup schedules, retention, sync jobs, pruning, GC, etc. I noticed GC was running MUCH faster than before. What took hours was now taking minutes.
I used this script to rebalance files in the dataset: https://github.com/markusressel/zfs-inplace-rebalancing because I already used this script in the past, albeit not on PBS datastore files, but on my file share dir and it did the job without any issues. Chunks are just files so... I used it here as well.
I then noticed that when verify jobs were running my SSD-s my PBS storage was getting a lot of writes. I figured that a verify job is updating something and came up with the idea that it was probably marking something inside the "ns" subdir inside PBS datastore. Because that's the only place things could be happening. In order to try to speed things up on my PBS I did the following. Since I didn't use the whole SSD space for the special device (I paritioned it and gave the partition to ZFS) I now made another small partition (10 GB) and made another zpool mirror on it. This one:
And on top of it I created a new ZFS dataset that looks like this:
I then:
- moved the contents of "ns" subdirectory from the spinning disks to this SSD pool
- deleted the ns subir
- and made a symlink from pbs-storage/ns subdirectory to point to this SSD dataset
To describe this better, this particular machine has PBS installed alongside with PVE. Storage is ZFS on 4 SAS spinning drives in RAID-10. This is the storage:
Bash:
# zpool status pool16
pool: pool16
state: ONLINE
scan: resilvered 5.30M in 00:00:01 with 0 errors on Mon Sep 30 17:01:36 2024
config:
NAME STATE READ WRITE CKSUM
pool16 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-35000cca2a0140e94 ONLINE 0 0 0
scsi-35000cca2a0140f48 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
scsi-35000cca2a0141a30 ONLINE 0 0 0
scsi-35000cca2a014014c ONLINE 0 0 0
special
mirror-2 ONLINE 0 0 0
ata-SSDSC2KG240G8R_PHYG9485038L240AGN-part1 ONLINE 0 0 0
ata-SSDSC2KG240G8R_PHYG9485044L240AGN-part1 ONLINE 0 0 0
errors: No known data errors
And this is the ZFS dataset that I keep my PBS storage on:
Bash:
# zfs get all pool16/pbs-storage
NAME PROPERTY VALUE SOURCE
pool16/pbs-storage type filesystem -
pool16/pbs-storage creation Sat Nov 25 13:54 2023 -
pool16/pbs-storage used 806G -
pool16/pbs-storage available 7.21T -
pool16/pbs-storage referenced 806G -
pool16/pbs-storage compressratio 1.20x -
pool16/pbs-storage mounted yes -
pool16/pbs-storage quota none default
pool16/pbs-storage reservation none default
pool16/pbs-storage recordsize 1M local
pool16/pbs-storage mountpoint /pool16/pbs-storage default
pool16/pbs-storage sharenfs off default
pool16/pbs-storage checksum on default
pool16/pbs-storage compression lz4 local
pool16/pbs-storage atime on local
pool16/pbs-storage devices on default
pool16/pbs-storage exec on default
pool16/pbs-storage setuid on default
pool16/pbs-storage readonly off default
pool16/pbs-storage zoned off default
pool16/pbs-storage snapdir hidden default
pool16/pbs-storage aclmode discard default
pool16/pbs-storage aclinherit restricted default
pool16/pbs-storage createtxg 17184 -
pool16/pbs-storage canmount on default
pool16/pbs-storage xattr sa local
pool16/pbs-storage copies 1 default
pool16/pbs-storage version 5 -
pool16/pbs-storage utf8only off -
pool16/pbs-storage normalization none -
pool16/pbs-storage casesensitivity sensitive -
pool16/pbs-storage vscan off default
pool16/pbs-storage nbmand off default
pool16/pbs-storage sharesmb off default
pool16/pbs-storage refquota 8T local
pool16/pbs-storage refreservation none default
pool16/pbs-storage guid 13360060514386266996 -
pool16/pbs-storage primarycache metadata local
pool16/pbs-storage secondarycache none inherited from pool16
pool16/pbs-storage usedbysnapshots 0B -
pool16/pbs-storage usedbydataset 806G -
pool16/pbs-storage usedbychildren 0B -
pool16/pbs-storage usedbyrefreservation 0B -
pool16/pbs-storage logbias latency default
pool16/pbs-storage objsetid 11253 -
pool16/pbs-storage dedup off default
pool16/pbs-storage mlslabel none default
pool16/pbs-storage sync standard default
pool16/pbs-storage dnodesize auto local
pool16/pbs-storage refcompressratio 1.20x -
pool16/pbs-storage written 806G -
pool16/pbs-storage logicalused 972G -
pool16/pbs-storage logicalreferenced 972G -
pool16/pbs-storage volmode default default
pool16/pbs-storage filesystem_limit none default
pool16/pbs-storage snapshot_limit none default
pool16/pbs-storage filesystem_count none default
pool16/pbs-storage snapshot_count none default
pool16/pbs-storage snapdev hidden default
pool16/pbs-storage acltype off default
pool16/pbs-storage context none default
pool16/pbs-storage fscontext none default
pool16/pbs-storage defcontext none default
pool16/pbs-storage rootcontext none default
pool16/pbs-storage relatime on local
pool16/pbs-storage redundant_metadata most local
pool16/pbs-storage overlay on default
pool16/pbs-storage encryption off default
pool16/pbs-storage keylocation none default
pool16/pbs-storage keyformat none default
pool16/pbs-storage pbkdf2iters 0 default
pool16/pbs-storage special_small_blocks 0 default
pool16/pbs-storage prefetch all default
This ZFS pool didn't have a special device until last week. I added two Intel DC series SSDs and added a special device as a mirror.
Then I went on to do... well, I used a script to "rebalance" the contents of PBS dataset in order to rewrite them to storage so that metadata finished on SSD-s. I made sure proxmox-backup.service, proxmox-backup-proxy.service and proxmox-backup-daily-update.service were stopped before I started the script. I should have probably stopped cron as well, but didn't. As far as I could tell, all went fine, although it took a VERY long time. My SSD-s were filled with metadata, zpool status was good. They I started those services back.
Backups were all there, verifying was going fine - all looked good. I made NO CHANGES to backup schedules, retention, sync jobs, pruning, GC, etc. I noticed GC was running MUCH faster than before. What took hours was now taking minutes.
I used this script to rebalance files in the dataset: https://github.com/markusressel/zfs-inplace-rebalancing because I already used this script in the past, albeit not on PBS datastore files, but on my file share dir and it did the job without any issues. Chunks are just files so... I used it here as well.
I then noticed that when verify jobs were running my SSD-s my PBS storage was getting a lot of writes. I figured that a verify job is updating something and came up with the idea that it was probably marking something inside the "ns" subdir inside PBS datastore. Because that's the only place things could be happening. In order to try to speed things up on my PBS I did the following. Since I didn't use the whole SSD space for the special device (I paritioned it and gave the partition to ZFS) I now made another small partition (10 GB) and made another zpool mirror on it. This one:
Bash:
# zpool status small_sdd_mirror
pool: small_sdd_mirror
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
small_sdd_mirror ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-SSDSC2KG240G8R_PHYG9485038L240AGN-part9 ONLINE 0 0 0
ata-SSDSC2KG240G8R_PHYG9485044L240AGN-part9 ONLINE 0 0 0
errors: No known data errors
And on top of it I created a new ZFS dataset that looks like this:
Bash:
# zfs get all small_sdd_mirror/pbs-ns
NAME PROPERTY VALUE SOURCE
small_sdd_mirror/pbs-ns type filesystem -
small_sdd_mirror/pbs-ns creation Wed Oct 2 17:25 2024 -
small_sdd_mirror/pbs-ns used 11.2M -
small_sdd_mirror/pbs-ns available 9.19G -
small_sdd_mirror/pbs-ns referenced 11.2M -
small_sdd_mirror/pbs-ns compressratio 1.71x -
small_sdd_mirror/pbs-ns mounted yes -
small_sdd_mirror/pbs-ns quota none default
small_sdd_mirror/pbs-ns reservation none default
small_sdd_mirror/pbs-ns recordsize 1M local
small_sdd_mirror/pbs-ns mountpoint /small_sdd_mirror/pbs-ns default
small_sdd_mirror/pbs-ns sharenfs off default
small_sdd_mirror/pbs-ns checksum on default
small_sdd_mirror/pbs-ns compression on default
small_sdd_mirror/pbs-ns atime on default
small_sdd_mirror/pbs-ns devices on default
small_sdd_mirror/pbs-ns exec on default
small_sdd_mirror/pbs-ns setuid on default
small_sdd_mirror/pbs-ns readonly off default
small_sdd_mirror/pbs-ns zoned off default
small_sdd_mirror/pbs-ns snapdir hidden default
small_sdd_mirror/pbs-ns aclmode discard default
small_sdd_mirror/pbs-ns aclinherit restricted default
small_sdd_mirror/pbs-ns createtxg 67 -
small_sdd_mirror/pbs-ns canmount on default
small_sdd_mirror/pbs-ns xattr on default
small_sdd_mirror/pbs-ns copies 1 default
small_sdd_mirror/pbs-ns version 5 -
small_sdd_mirror/pbs-ns utf8only off -
small_sdd_mirror/pbs-ns normalization none -
small_sdd_mirror/pbs-ns casesensitivity sensitive -
small_sdd_mirror/pbs-ns vscan off default
small_sdd_mirror/pbs-ns nbmand off default
small_sdd_mirror/pbs-ns sharesmb off default
small_sdd_mirror/pbs-ns refquota none default
small_sdd_mirror/pbs-ns refreservation none default
small_sdd_mirror/pbs-ns guid 8725473865594968389 -
small_sdd_mirror/pbs-ns primarycache all default
small_sdd_mirror/pbs-ns secondarycache all default
small_sdd_mirror/pbs-ns usedbysnapshots 0B -
small_sdd_mirror/pbs-ns usedbydataset 11.2M -
small_sdd_mirror/pbs-ns usedbychildren 0B -
small_sdd_mirror/pbs-ns usedbyrefreservation 0B -
small_sdd_mirror/pbs-ns logbias latency default
small_sdd_mirror/pbs-ns objsetid 643 -
small_sdd_mirror/pbs-ns dedup off default
small_sdd_mirror/pbs-ns mlslabel none default
small_sdd_mirror/pbs-ns sync standard default
small_sdd_mirror/pbs-ns dnodesize legacy default
small_sdd_mirror/pbs-ns refcompressratio 1.71x -
small_sdd_mirror/pbs-ns written 11.2M -
small_sdd_mirror/pbs-ns logicalused 17.9M -
small_sdd_mirror/pbs-ns logicalreferenced 17.9M -
small_sdd_mirror/pbs-ns volmode default default
small_sdd_mirror/pbs-ns filesystem_limit none default
small_sdd_mirror/pbs-ns snapshot_limit none default
small_sdd_mirror/pbs-ns filesystem_count none default
small_sdd_mirror/pbs-ns snapshot_count none default
small_sdd_mirror/pbs-ns snapdev hidden default
small_sdd_mirror/pbs-ns acltype off default
small_sdd_mirror/pbs-ns context none default
small_sdd_mirror/pbs-ns fscontext none default
small_sdd_mirror/pbs-ns defcontext none default
small_sdd_mirror/pbs-ns rootcontext none default
small_sdd_mirror/pbs-ns relatime on default
small_sdd_mirror/pbs-ns redundant_metadata all default
small_sdd_mirror/pbs-ns overlay on default
small_sdd_mirror/pbs-ns encryption off default
small_sdd_mirror/pbs-ns keylocation none default
small_sdd_mirror/pbs-ns keyformat none default
small_sdd_mirror/pbs-ns pbkdf2iters 0 default
small_sdd_mirror/pbs-ns special_small_blocks 0 default
small_sdd_mirror/pbs-ns prefetch all default
I then:
- moved the contents of "ns" subdirectory from the spinning disks to this SSD pool
- deleted the ns subir
- and made a symlink from pbs-storage/ns subdirectory to point to this SSD dataset