[PVE] High I/O delay when transferring data

zegorax

Active Member
Feb 9, 2019
11
1
43
26
Dear Proxmox community,

Hope you're doing well :)

After being on ESXi for a few years, I'm glad I came back to Proxmox! Unfortunately, during the restoration of my backups (involving moving a disk from an NFS datastore to local-zfs), I'm running into a very high I/O delay. It is staying at 70-80% all the time during the data transfer.

I'm looking for advice on what to do to improve my ZFS performance. It consists of 4x4TB WD RED disks in a RAIDZ-1 configuration. Here are the output of some commands, but feel free to ask for more if missed anything.

Do you think you could help me solve this ? Thanks a lot in advance!

Code:
root@hprv:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.4-2-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.0-1
proxmox-backup-file-restore: 3.2.0-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.1
pve-cluster: 8.0.6
pve-container: 5.0.10
pve-docs: 8.2.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.5
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2

Code:
root@hprv:~# pveperf
CPU BOGOMIPS:      114914.64
REGEX/SECOND:      1596317
HD SIZE:           6185.95 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     6.87
DNS EXT:           30.33 ms
DNS INT:           24.89 ms

Code:
root@hprv:~# zpool iostat
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       6.07T  8.47T     20    327   122K  18.7M

1721376015919.png
 
Some WD Red use SMR with is problematic with ZFS (slow up to the point of getting errors). Also, RAIDz1 is not the same as hardware RAID5 (and terrible for running VMs on) and very slow on writes. There have been a few threads on this forum before on how disappointed people are when they move from ESXi hardware RAID to PVE with RAIDz1.
 
Thanks for your answer. For this type of config, which ZFS config would be the best ? The drives are all the same, model WDC_WD40EFRX-68N (quite old, 4.5 years of runtime)
 
For this type of config, which ZFS config would be the best ? The drives are all the same, model WDC_WD40EFRX-68N (quite old, 4.5 years of runtime)
Those drives appear to be Red Plus, which at least do not use SMR. For best performance use (second-hand) enterprise SSDs with PLP.
How did you use those old drives with ESXi? If you were happy with the performance, maybe do that again.
A ZFS stripe of two mirrors (which is like RAID10) will give you better IOPS (and about the same amount of usable space as RAIDz1, which due to padding is about 50%).
 
  • Like
Reactions: jsterr
ESXi was running on top of the HW RAID controller with a RAID5 config. (With Proxmox it is now in HBA mode) Now I largely prefer Proxmox, I've used ESXi in the past for training reasons.
Are there other paths to explore which would not involve reshaping the array?
 
Are there other paths to explore which would not involve reshaping the array?
Besides adding more hardware, especially Enterprise SSDs? Unfortunately not.

If you are open to add more hardware, I can recomment my default setup for spinning rust pools:
  • add two Enterprise SSDs als special devices for all the metadata and "fast" datasets (online improvement)
  • add two 16 GB Optane NVMe for SLOG, which will significantly improve synchronous writes (online improvement)
and of course:
  • reshape to stripped mirrors is always a good idea, offline ....
 
imo, ZFS for home lab is too costly if you need performance.
keep ext4/Lvmthin over your hw raid, even better for iops switch to hw Raid10.
 
Since there was the "ZFS fame" all over online, I decided to give it a go and see what could come out of it. But apparently I may have had to stick with my HW raid.

I will report back after the backup restore with how the performance is behaving.
 
Just another detail, I have also reinstalled another server with almost the exact same config at another location.
It is constituted of 9x3TB disks with a ZFS RAIDZ1 config. The disks are WDC_WD30EFRX-68(E-N-A). But with this server, (also doing a restore from backup) there is almost no I/O wait.

I'm not quite sure what is the difference there. It's also the same version as the first one, but pveperf reports a higher FSYNCS :

Code:
root@pve:~# pveperf
CPU BOGOMIPS:      166004.48
REGEX/SECOND:      1502787
HD SIZE:           1972.72 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     47.93
DNS EXT:           1008.99 ms
DNS INT:           1002.46 ms (localdomain)

1721383657488.png
 
Last edited:
FSYNCS/SECOND: 47.93
This is better than the other machine, yet still very, very bad.

These are the numbers of an ODROID H3 (single looooowend intel cpu) with two enterprise SSDs in a ZFS mirror:

Code:
$ pveperf
CPU BOGOMIPS:      15974.40
REGEX/SECOND:      2272416
HD SIZE:           6.00 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     4570.74
DNS EXT:           36.24 ms
DNS INT:           2.26 ms (fritz.box)
 
I see, so indeed the HDD to SSD makes a big difference. I will check if I can do it as well.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!