Extremely low ZFS performance

fsc

Member
Oct 4, 2021
21
3
8
25
Hi, I have an HP Proliant machine with proxmox and I am having very low read / write speeds on my ZFS drive. These problems are even capable of blocking all my VMs

Mi server: ProLiant DL360e Gen8
  • 2 x Intel(R) Xeon(R) CPU E5-2470 v2 @ 2.40GHz
  • 72 GB ECC RAM @ 1333 Mhz
  • 3 x SanDisk SSD Plus Sata III 1TB (535 MB/S) in SmartArray P420 (HBA mode) <<<---- This is configured with ZFS as RAID 5
  • 1 x Comercial SATA SSD for Proxmox HOST
When I copy large files (in this case each of the example files is approximately 2.5-3 GB), the copy starts at a relatively high speed, approximately 120MB/s and after a few seconds drops to 50, to end up working at 1MB/s ... This is really bad and I can't tell what's going on.

1633466975874.png

This causes the system to become extremely slow, and eventually all virtual machines are locked due to increased IO Delay and load.

1633467686293.png

When I limit the speed of the disks in the virtual machines to 50MB/s nothing changes, the speed starts to drop rapidly.

the ZFS pool configuration is as follows::

Code:
root@proxmox-1:~# zpool get all
NAME             PROPERTY                       VALUE                          SOURCE
Main_VM_Storage  size                           2.72T                          -
Main_VM_Storage  capacity                       30%                            -
Main_VM_Storage  altroot                        -                              default
Main_VM_Storage  health                         ONLINE                         -
Main_VM_Storage  guid                           12320134737522795333           -
Main_VM_Storage  version                        -                              default
Main_VM_Storage  bootfs                         -                              default
Main_VM_Storage  delegation                     on                             default
Main_VM_Storage  autoreplace                    off                            default
Main_VM_Storage  cachefile                      -                              default
Main_VM_Storage  failmode                       wait                           default
Main_VM_Storage  listsnapshots                  off                            default
Main_VM_Storage  autoexpand                     off                            default
Main_VM_Storage  dedupratio                     1.00x                          -
Main_VM_Storage  free                           1.89T                          -
Main_VM_Storage  allocated                      844G                           -
Main_VM_Storage  readonly                       off                            -
Main_VM_Storage  ashift                         12                             local
Main_VM_Storage  comment                        -                              default
Main_VM_Storage  expandsize                     -                              -
Main_VM_Storage  freeing                        0                              -
Main_VM_Storage  fragmentation                  31%                            -
Main_VM_Storage  leaked                         0                              -
Main_VM_Storage  multihost                      off                            default
Main_VM_Storage  checkpoint                     -                              -
Main_VM_Storage  load_guid                      3251330473448672712            -
Main_VM_Storage  autotrim                       off                            default
Main_VM_Storage  feature@async_destroy          enabled                        local
Main_VM_Storage  feature@empty_bpobj            active                         local
Main_VM_Storage  feature@lz4_compress           active                         local
Main_VM_Storage  feature@multi_vdev_crash_dump  enabled                        local
Main_VM_Storage  feature@spacemap_histogram     active                         local
Main_VM_Storage  feature@enabled_txg            active                         local
Main_VM_Storage  feature@hole_birth             active                         local
Main_VM_Storage  feature@extensible_dataset     active                         local
Main_VM_Storage  feature@embedded_data          active                         local
Main_VM_Storage  feature@bookmarks              enabled                        local
Main_VM_Storage  feature@filesystem_limits      enabled                        local
Main_VM_Storage  feature@large_blocks           enabled                        local
Main_VM_Storage  feature@large_dnode            enabled                        local
Main_VM_Storage  feature@sha512                 enabled                        local
Main_VM_Storage  feature@skein                  enabled                        local
Main_VM_Storage  feature@edonr                  enabled                        local
Main_VM_Storage  feature@userobj_accounting     active                         local
Main_VM_Storage  feature@encryption             enabled                        local
Main_VM_Storage  feature@project_quota          active                         local
Main_VM_Storage  feature@device_removal         enabled                        local
Main_VM_Storage  feature@obsolete_counts        enabled                        local
Main_VM_Storage  feature@zpool_checkpoint       enabled                        local
Main_VM_Storage  feature@spacemap_v2            active                         local
Main_VM_Storage  feature@allocation_classes     enabled                        local
Main_VM_Storage  feature@resilver_defer         enabled                        local
Main_VM_Storage  feature@bookmark_v2            enabled                        local
Main_VM_Storage  feature@redaction_bookmarks    enabled                        local
Main_VM_Storage  feature@redacted_datasets      enabled                        local
Main_VM_Storage  feature@bookmark_written       enabled                        local
Main_VM_Storage  feature@log_spacemap           active                         local
Main_VM_Storage  feature@livelist               enabled                        local
Main_VM_Storage  feature@device_rebuild         enabled                        local
Main_VM_Storage  feature@zstd_compress          enabled                        local

Thank you very much for creating Proxmox and thank you very much for your help.
 
@ ness1602 I understand that they are bad, but bad enough to go down to 1MB/s or less?
 
I want to add that the test has been done by making a copy from one virtual machine to another through a samba server (cifs). Is it possible that what happens here is that copies from one virtual machine to another are very expensive?

Thank you so much
 
I want to add that the test has been done by making a copy from one virtual machine to another through a samba server (cifs). Is it possible that what happens here is that copies from one virtual machine to another are very expensive?

Thank you so much
@ ness1602 I understand that they are bad, but bad enough to go down to 1MB/s or less?
any consumer ssd will be pretty bad with zfs or ceph, where fast syncronous writes are needed for the journal.
(you really need datacenter ssd for zfs journal).

a singie copy from server to server, will be the slowest too, because it's a single iodepth, no parallelism.

just check this old blog, with bench with 1 job, you'll have result between 1-7MB/s for consumer driver.
https://www.sebastien-han.fr/blog/2...-if-your-ssd-is-suitable-as-a-journal-device/
 
  • Like
Reactions: che
@spirit Thank you very much for your answer, the article is very interesting and useful.

So... my disks are not suitable for mounting zfs pools.

Wow, I really didn't know there was so much variation from one SSD to another. So, what do I have to look when I want to buy large disks (6TB or more disks)
to mount large zfs pool?
 
ZFS got a terrible write amplification, especially for sync writes. I've seen here write amplification from factor 3 for big async writes to factor 80 for 4k sync writes. That means for each 1GB you write inside a VM the host will write 3 to 80GB to your SSD so your SSD will die 3 to 80 times faster and you will only got 1/3 to 1/80 of the performance.
So you want a enterprise SSD that is designed for atleast mixed workloads so it got a decent TBW/DWPD to be able to handle all that writes without wearing too fast.
The other point is that you want a enterprise SSD because these got a powerloss protection that consumer SSDs are missing. A powerloss protection won't only prevent loosing data when doing async writes in case of a kernel crash or power outage, it will also allow your SSD to cache sync writes in RAM, so your sync write performance should be way better.
 
Last edited:
  • Like
Reactions: RolandK
Hi @h4
I have no knowledge of ZFS ... You are right.

The configuration that I have attached is the default one that proxmox does, I have not manually modified any value.

Do you have any recommendations on how to configure it with this drives?
 
First you would need to increase your pools blocksize (volblocksize) from the default 8K to 16K or otherwise you will loose 50% of your raw capacity and not just 33% because you would get additional 17% padding overhead. And volblocksize can't be changed after creation of your virtual disks, so you would need to destroy and recreate all virtual disks so the new blocksize will be used. You can do that by editing the blocksize (Datacenter -> Storage -> YourZFSPool -> Edit -> Blocksize) and then backing up all the VMs and restoring them by replacing the VMs.
And I would disable atime and maybe enable relatime for your pool for better performance so not every single read operation will cause an additional write.
 
Last edited:
Do you have any suggestions to improve performance on similar hardware, without the need to purchase SAS enterprise drives? I have tried various recommendations with limited success. It's hard to believe I am the only one facing this issue. Are there any monitoring tools to identify the root cause and aid in solving the problem?
 
Do you have any suggestions to improve performance on similar hardware, without the need to purchase SAS enterprise drives?
SATA or NVMe enterprise drives work fine too (if they have PLP). Don't use a single RAIDz1/2/3 but use a stripe of mirrors (or a stripe of RAIDz if you have dozens of drives). Make sure volblocksize and ashift and the VM filesystems align to reduce amplification.Maybe experiment with large slow drives and a special device
have tried various recommendations with limited success.
I don't know what you've tried and whether you want to optimize for size, write-speed or read-speed (and I'm not an expert). ZFS is very flexible but it's impossible to optimize for everything.
It's hard to believe I am the only one facing this issue. Are there any monitoring tools to identify the root cause and aid in solving the problem?
Lots of threads on this forum (and the Proxmox Backup Server one) about this. There is no one simple fix for everything. Lots of information also outside of this forum, as OpenZFS is not limited to Proxmox. Many people use LVM instead of ZFS because it's too complex or overkill for their use-cases.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!