z_wr_iss high CPU usage and high CPU load

harmonyp

Member
Nov 26, 2020
195
4
23
46
I have created a NVMe RAIDZ zfs pool and have noticed it uses a lot more CPU power than a similar setup using RAID-10.

Cloning a 40GB template causes the servers load to skyrocket and you can really feel the lag trying to run anything else during this process.

At peaks it's using almost half of my servers CPU power (AMD EPYC 7502P) for short bursts. This feels really excessive.

Is there anything I can do to reduce this load? Also I presume RAID0 would give the best performance?
 
I have created a NVMe RAIDZ zfs pool and have noticed it uses a lot more CPU power than a similar setup using RAID-10.
Thats normal. Raid10 is just writing stuff to multiple disks without any big computations. For raidz it is way more complex. You need to compute parity data and so on. So a raidz will always be more CPU heavy.
Cloning a 40GB template causes the servers load to skyrocket and you can really feel the lag trying to run anything else during this process.

At peaks it's using almost half of my servers CPU power (AMD EPYC 7502P) for short bursts. This feels really excessive.
Did you optimize your volblocksize? If you are just using the default 8K volblocksize you are wasting alot of capacity and you are loosing performance due to padding overhead.
Is there anything I can do to reduce this load? Also I presume RAID0 would give the best performance?
Yes but also no redundancy and ZFS won't be able to heal itself if data gets corrupted.
 
  • Like
Reactions: harmonyp
Did you optimize your volblocksize? If you are just using the default 8K volblocksize you are wasting alot of capacity and you are loosing performance due to padding overhead.
No I have not. What do you suggest for a combination of Windows/Linux VMs SAMSUNG MZQLB1T9HAJR-00007 on RAIDZ (3 drives). Also the same but for RAID10 (4 drives).
 
Last edited:
No I have not. What do you suggest for a combination of Windows/Linux VMs SAMSUNG MZQLB1T9HAJR-00007 on RAIDZ (3 drives). Also the same but for RAID10 (4 drives).
If you use a pool that was created with ashift=12 the best should be 8K volblocksize for raid10 of 4 drives or 16K volblocksize for raidz1 of 3 disks.
 
If you use a pool that was created with ashift=12 the best should be 8K volblocksize for raid10 of 4 drives or 16K volblocksize for raidz1 of 3 disks.
Ok. Got a problem my cluster both ZFS have the same name

622d027b8c28d568de32e109164e67c6.png


If I change them manually will the cluster storage setting change it back?
 
Also keep in mind that you can't simple change the volblocksize for existing zvols. The volblocksize can only be set at creation so you will need to destroy and recreate all virtual disks so that a change in the "Block Size" for the Pool takes effect. The best way to do this is to shutdown all VMs, back them up, overwrite them by restoring them from the backup. If you then start the VMs they should be using the new volblocksize and the pool should be 33% more empty.
You don't need to replace LXCs because they use the 128K recordsize instead of the volblocksize.
 
  • Like
Reactions: harmonyp
Also keep in mind that you can't simple change the volblocksize for existing zvols. The volblocksize can only be set at creation so you will need to destroy and recreate all virtual disks so that a change in the "Block Size" for the Pool takes effect. The best way to do this is to shutdown all VMs, back them up, overwrite them by restoring them from the backup. If you then start the VMs they should be using the new volblocksize and the pool should be 33% more empty.
You don't need to replace LXCs because they use the 128K recordsize instead of the volblocksize.
Ok thanks a lot for the info. Can I ask why do you suggest the 8k for RAID-10 and 16k for RAIDZ1. How are you calculating it?

If I change this now new virtual machines will use the new values correct? I can backup/restore the existing ones at a later date.

Also in my last message there are multiple nodes with this setting in our cluster (8k) there is no way to individually set the Block Size setting in the GUI for each cluster. If I manually adjust this will the GUI setting change it at a later date?
 
Ok thanks a lot for the info. Can I ask why do you suggest the 8k for RAID-10 and 16k for RAIDZ1. How are you calculating it?
For striped pools you want blocksize of your pool (so 4K if ashift of 12 is used) multiplied by the number of data bearing disks. So if you got two mirrors of each 2 SSDs striped together using ashift of 12 that is 2 * 4K = 8K.
For raidz it is more complex. Here is a table that show parity+padding loss for different volblocksizes for raidz1/2/3 with 3 to 24 disks. And here is the is a blog post of the leading ZFS engineer explaining how raidz works and how that values of that spreadsheet are calculated.
So a raidz1 of 3 disks with ashift of 12 and a volblocksize of 8K will loose 50% of the raw capacity (33% to parity + 17% to padding overhead). With a volblocksize of 16K you only loose 33% of the raw capacity because there is no padding overhead.

If I change this now new virtual machines will use the new values correct?
Yes.
Also in my last message there are multiple nodes with this setting in our cluster (8k) there is no way to individually set the Block Size setting in the GUI for each cluster. If I manually adjust this will the GUI setting change it at a later date?
I don't know. Maybe the staff can answer this.
 
Last edited:
For striped pools you want blocksize of your pool (so 4K if ashift of 12 is used) multiplied by the number of data bearing disks. So if you got two mirrors of each 2 SSDs striped together using ashift of 12 that is 2 * 4K = 8K.
For raidz it is more complex. Here is a table that show parity+padding loss for different volblocksizes for raidz1/2/3 with 3 to 24 disks. And here is the is a blog post of the leading ZFS engineer explaining how raidz works and how that values of that spreadsheet are calculated.
So a raidz1 of 3 disks with ashift of 12 and a volblocksize of 8K will loose 50% of the raw capacity (33% to parity + 17% to padding overhead). With a volblocksize of 16K you only loose 33% of the raw capacity because there is no padding overhead.


Yes.

I don't know. Maybe the staff can answer this.
Ok thanks for your help.

This will also adjust if I just migrate the machine too? rather than backup/restore.
 
Ok thanks for your help.

This will also adjust if I just migrate the machine too? rather than backup/restore.
Not sure but I guess that would work too. As long as that deletes the old zvol and creates a new one that should work.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!