Verify jobs - Terrible IO performance

harmonyp

Member
Nov 26, 2020
196
4
23
46
I have about 100 virtual machines I back up daily and experience terrible IO performance during verificatinos that take 10+ hours

0d7e988d84ecf367812d49b8eb744212.png


5b23c8d79e5537648c40a4ad17c68449.png


The disks used are 4x RAID 10 (Hardare RAID) WUH721414AL5201 (https://www.serversupply.com/HARD D...PM/WESTERN DIGITAL/WUH721414AL5201_313352.htm)
Hardware Raid Controller: LSI MegaRAID SAS 9361-8i

Is this because of the sector size of the disk? Should it be set smaller?

Code:
Sector Size:  512
Logical Sector Size:  512
 
I don't recall what it's set to how do I check in hardware raid if it's ext4/xfs.

Are there also any posts about ZFS vs Hardware RAID for PBS? I presume hardware is better maybe @Dunuin knows?
 
PBS needs high IOPS performance. Benefit of ZFS would be that you can accelerate it using SSDs to store the metadata. But won't help that much with verify tasks (but still a bit as the HDDs are hit by less IO, because all the metadata that is read/written from the SSD doesn't has been read/written from the HDDs).

In general HDDs shouldn't be used with PBS, atleast not if you store alot of backups. And if you still do it, it's highly recommended to also use SSDs for storing the metadata.

Try df -Th to see a list of all filesystems and its types.
 
Last edited:
PBS needs high IOPS performance. Benefit of ZFS would be that you can accelerate it using SSDs to store the metadata. But won't help that much with verify tasks (but still a bit as the HDDs are hit by less IO, because all the metadata that is read/written from the SSD doesn't has been read/written from the HDDs).

In general HDDs shouldn't be used with PBS, atleast not if you store alot of backups. And if you still do it, it's highly recommended to also use SSDs for storing the metadata.

Try df -Th to see a list of all filesystems and its types.
SSD storage is too much I will see if zfs speeds things up. I am in the process of making my own storage server would having more drives improve the time taken to verify backups (presuming disk are same speed).

Also is there an optimal ZFS raid type and ashift/block size optimal for PBS storage on ZFS.
 
Last edited:
SSD storage is too much I will see if zfs speeds things up. I am in the process of making my own storage server would having more drives improve the time taken to verify backups (presuming disk are same speed).

Also is there an optimal ZFS raid type and ashift/block size optimal for PBS storage on ZFS.
for ZFS you need also special vdev. So you need SSDs also for ZFS, otherwise performance will be even worst than now .
 
for ZFS you need also special vdev. So you need SSDs also for ZFS, otherwise performance will be even worst than now .
You talking about a pool with cache? (L2ARC)

I plan on getting 2 M.2 NVMe drives for that in RAID 1. Question would be how big do they need to be? If 250GB is enough that would be great I will get gen4 drives.
 
You talking about a pool with cache? (L2ARC)

I plan on getting 2 M.2 NVMe drives for that in RAID 1. Question would be how big do they need to be? If 250GB is enough that would be great I will get gen4 drives.
No, not L2ARC but the "special" vdev. See here how to calculate the special device size: https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954
Usually it should be around 0.4% of the size of the HDDs.
Also is there an optimal ZFS raid type and ashift/block size optimal for PBS storage on ZFS.
IOPS performance will only increase with the number of disks when you stripe. So adding more disks won't help with any raidz1/2/3. Striped mirror (aka raid10) will do that and is highly recommended when using PBS. Ashift depends on your disks physical sector sizes. Usually ashift=12 should be fine.
PBS primarily stores data bigger than 1MB, so it might be a good idea to increase the recordsize from default 128K to 1M for less overhead.
 
Last edited:
No, not L2ARC but the "special" vdev. See here how to calculate the special device size: https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954
Usually it should be around 0.4% of the size of the HDDs.

IOPS performance will only increase with the number of disks when you stripe. So adding more disks won't help with any raidz1/2/3. Striped mirror (aka raid10) will do that and is highly recommended when using PBS. Ashift depends on your disks physical sector sizes. Usually ashift=12 should be fine.
PBS primarily stores data bigger than 1MB, so it might be a good idea to increase the recordsize from default 128K to 1M for less overhead.
Where is a guide on setting up this special udev? Don't see it mentioned in Proxmox ZFS page.
 
@Dunuin @Neobin Have much data does the special udev write? I am going to dedicated two disks I just don't know if I should go with a drive like PM9A1 which is cheaper but has lower total endurance (600TBW) or a high endurance drive, trying to weight the pros/cons.
 
Hard to tell. On the one hand, the host isn't writing much, when the data to metadata ratio is only about 250:1. But metadata should always be quite small and written synchronous, so this might cause a lot of write amplifiation. Unfortunately, all special devices are in my TrueNAS server, and all TrueNAS disks aren't monitored by my zabbix server, so I can't look at some graphs to see how much it's actually written to the NAND.

Maybe someone else monitored them and can tell you real numbers.
 
Hard to tell. On the one hand, the host isn't writing much, when the data to metadata ratio is only about 250:1. But metadata should always be quite small and written synchronous, so this might cause a lot of write amplifiation. Unfortunately, all special devices are in my TrueNAS server, and all TrueNAS disks aren't monitored by my zabbix server, so I can't look at some graphs to see how much it's actually written to the NAND.

Maybe someone else monitored them and can tell you real numbers.
I have the following devices

Code:
/dev/nvme0n1
/dev/nvme1n1
/dev/sdb
/dev/sdc
/dev/sdd
/dev/sda
/dev/sdf
/dev/sdg
/dev/sde
/dev/sdh

I'm going to run the following just want to make sure that's all I need to do to create the pool?
Code:
sudo zpool create -f zfs -o ashift=12 mirror /dev/sdb /dev/sdc /dev/sdd /dev/sda /dev/sdf /dev/sdg /dev/sde /dev/sdh special mirror /dev/nvme0n1 /dev/nvme1n1
zfs set special_small_blocks=4K zfs
zfs set compression=lz4 zfs

Do you think this is optimal?
 
Last edited:
I'm going to run the following just want to make sure that's all I need to do to create the pool?
Code:
sudo zpool create -f zfs -o ashift=12 /dev/sdb /dev/sdc /dev/sdd /dev/sda /dev/sdf /dev/sdg /dev/sde /dev/sdh special mirror /dev/nvme0n1 /dev/nvme1n1
zfs set special_small_blocks=4K zfs
zfs set compression=lz4 zfs

Do you think this is optimal?
That will create a stripe/raid0. If a single disk dies you will lose all data on that pool and there is no bit rot protection.

If those sda to sdh are HDDs I would highly recommend to use a striped mirror (raid10):
zpool create -f zfs -o ashift=12 mirror /dev/sda /dev/sdb mirror /dev/sdc /dev/sdd mirror /dev/sde /dev/sdf mirror /dev/sdg /dev/sdh special mirror /dev/nvme0n1 /dev/nvme1n1

And I wouldn't use /dev/sda and so on to add them to the pool. Use /dev/disk/by-id/... instead.
 
Last edited:
That will create a stripe/raid0. If a single disk dies you will lose all data on that pool and there is no bit rot protection.

If those sda to sdh are HDDs I would highly recommend to use a striped mirror (raid10):
zpool create -f zfs -o ashift=12 mirror /dev/sda /dev/sdb mirror /dev/sdc /dev/sdd mirror /dev/sde /dev/sdf mirror /dev/sdg /dev/sdh special mirror /dev/nvme0n1 /dev/nvme1n1

And I wouldn't use /dev/sda and so on to add them to the pool. Use /dev/disk/by-id/... instead.
I changed that to mirror which is RAID 1 what is the correct command for RAID 10? Are the other commands good for PBS too?
 
Last edited:
I changed that to mirror which is RAID 1 what is the correct command for RAID 10? Are the other commands good for PBS too?
LZ4 should be fine. Won't help with chunks, as these are already ZSTD compressed by the PVE client. But could save a few MBs/GBs by compressing the index files. If "special_small_blocks=4K" makes sense depends...your SSDs need to be big enough to store this additional data. If they get full, metadata will spill over to the HDDs making everything slow again.
 
Last edited:
LZ4 should be fine. Won't help with chunks, as these are already ZSTD compressed by the PVE client. But could save a few MBs/GBs by compressing the index files. If "special_small_blocks=4K" makes sense depends...your SSDs need to be big enough to store this additional data.
To create RAID-10 I will run

Code:
sudo zpool create -f zfs -o ashift=12 mirror /dev/sdb /dev/sdc /dev/sdd /dev/sda mirror /dev/sdf /dev/sdg /dev/sde /dev/sdh special mirror /dev/nvme0n1 /dev/nvme1n1
zfs set special_small_blocks=4K zfs
zfs set compression=lz4 zfs


The sda drives are 14TB and NVMe drives are only 500GB do I not need to run the following for L2ARC or ZIL?
Code:
zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
zpool create -f -o ashift=12 <pool> <device> log <log_device>

Note the ZFS pool is only 28GB for some reason after running the above.
 
Last edited:
To create RAID-10 I will run

Code:
sudo zpool create -f zfs -o ashift=12 mirror /dev/sdb /dev/sdc /dev/sdd /dev/sda mirror /dev/sdf /dev/sdg /dev/sde /dev/sdh special mirror /dev/nvme0n1 /dev/nvme1n1
[/QUOTE]
That will create two striped quad-mirrors and not 4 striped normal (2-disk) mirrors. I don't think you want that, as you will lose 80% of your raw capacity.
[QUOTE="harmonyp, post: 512508, member: 106157"]
zfs set special_small_blocks=4K zfs
zfs set compression=lz4 zfs
I would also set the recordsize to 1M or even 4M. The latter one requires to enable a pool feature first.
The sda drives are 14TB and NVMe drives are only 500GB do I not need to run the following for L2ARC or ZIL?
Code:
zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
zpool create -f -o ashift=12 <pool> <device> log <log_device>

Note the ZFS pool is only 28GB for some reason after running the above.
SLOG and L2ARC shouldn't help much and I wouldn't put the SLOG on the same disks as the special devices, because of the heavy wear.
 
I would also set the recordsize to 1M or even 4M. The latter one requires to enable a pool feature first.

SLOG and L2ARC shouldn't help much and I wouldn't put the SLOG on the same disks as the special devices, because of the heavy wear.
Ok thanks for the info. Any ideas why the command below creates the pool with only 26TB not sure how to confirm if it's RAID 10

Code:
sudo zpool create -f zfs -o ashift=12 mirror /dev/sdb /dev/sdc /dev/sdd /dev/sda mirror /dev/sdf /dev/sdg /dev/sde /dev/sdh special mirror /dev/nvme0n1 /dev/nvme1n1

This way appears to work though is it the only way other than through the GUI?

Code:
sudo zpool create -f zfs -o ashift=12 mirror /dev/sda /dev/sdb special mirror /dev/nvme0n1 /dev/nvme1n1
sudo zpool add zfs mirror /dev/sdc /dev/sdd
sudo zpool add zfs mirror /dev/sde /dev/sdf
sudo zpool add zfs mirror /dev/sdg /dev/sdh

Also how do I set recordsize? is it blocksize?
 
Last edited:
Ok thanks for the info. Any ideas why the command below creates the pool with only 26TB not sure how to confirm if it's RAID 10

Code:
sudo zpool create -f zfs -o ashift=12 mirror /dev/sdb /dev/sdc /dev/sdd /dev/sda mirror /dev/sdf /dev/sdg /dev/sde /dev/sdh special mirror /dev/nvme0n1 /dev/nvme1n1
I already explained that. The above command won't create a normal raid10. It creates two striped quad mirrors.
And I also wrote the correct command for a raid10 some posts above:
If those sda to sdh are HDDs I would highly recommend to use a striped mirror (raid10):
zpool create -f zfs -o ashift=12 mirror /dev/sda /dev/sdb mirror /dev/sdc /dev/sdd mirror /dev/sde /dev/sdf mirror /dev/sdg /dev/sdh special mirror /dev/nvme0n1 /dev/nvme1n1


Also how do I set recordsize? is it blocksize?
zfs set recordsize=1M YourPoolName/DatasetUsedAsDatastore
 
Last edited:
  • Like
Reactions: harmonyp