TLDR;
Long version:
I have a PBS with 8x14tb HDD + special device. The PVE cluster has like 600VM and there's ~8000 snapshots in the namespace used by PVE. The special device got filled over 75% and due to zfs_special_class_metadata_reserve_pct default of "25%", after that point only metadata got stored in the special device. Now listing the snapshots takes between 40 to 90 seconds, depending on how loaded is PBS (GC, sync jobs, verify). Even listing the backups from PBS itself with
This same issue would affect full HDD PBS without special device and overloaded PBS servers that may take long to respond to such requests.
Things that I'm already aware of:
- Have reduced
- Small blocks already stored in HDD will remain there until the snapsnots using them are eventually purged, so even if we expand the special device listings will be slow for some time.
- Tried to force ZFS to keep metadata longer in ARC with zfs_arc_meta_balance, but no value over the default 500 made any difference.
- Tried to set ARC to cache metadata only (
- Have no place to setup a L2ARC with
Thanks!
- Is it possible to increase the timeout in PVE when listing PBS backups? Seems to timeout at ~25 seconds and there is no way to get the list of backups to do a restore, neither from the storage view nor within the VM it self.
- Is there a way to really tell ZFS to always keep metadata in ARC? (afaik no, but maybe someone has some trick for this).
Long version:
I have a PBS with 8x14tb HDD + special device. The PVE cluster has like 600VM and there's ~8000 snapshots in the namespace used by PVE. The special device got filled over 75% and due to zfs_special_class_metadata_reserve_pct default of "25%", after that point only metadata got stored in the special device. Now listing the snapshots takes between 40 to 90 seconds, depending on how loaded is PBS (GC, sync jobs, verify). Even listing the backups from PBS itself with
/usr/bin/proxmox-backup-client snapshot list --repository 'user@pbs@localhost:8007:DATASTORE -ns NAMESPACE'
take as long as doing it from PVE with pvesm list pbs_STORAGENAME
.This same issue would affect full HDD PBS without special device and overloaded PBS servers that may take long to respond to such requests.
Things that I'm already aware of:
- Have reduced
zfs_special_class_metadata_reserve_pct
to 10 to allow new small_blocks to be allocated in the special device.- Small blocks already stored in HDD will remain there until the snapsnots using them are eventually purged, so even if we expand the special device listings will be slow for some time.
- Tried to force ZFS to keep metadata longer in ARC with zfs_arc_meta_balance, but no value over the default 500 made any difference.
- Tried to set ARC to cache metadata only (
primarycache=metadata
), it reduced the times by ~20% but still too slow (and would make other PBS operations way slower).- Have no place to setup a L2ARC with
secondarycache=metadata
, but given that primarycache=metadata didn't help much, don't think it would help this time.Thanks!