I have resumed performance testing of disk performance (see also initial thread at https://forum.proxmox.com/threads/sync-writes-to-zfs-zvol-disk-are-not-sync-under-pve.163066/), and stumbled on some more non-intuitive results. I'll group them into separate threads, as symptoms are different.
So overall I am running a bunch of different fio tests, and compare on-host performance with the same tests within VM and LXC container. The focus was on ZFS, but I also did some tests using LVM and thin LVM trying to pinpoint issues. Most fio tests are done using block devices, except for LXC ones that write to a file on the root disk (backed by such block device). I was unable to pass block device directly to a container, though I did not try very hard. Tests are done on Intel P4510 NVMe SSD (4TB).
So in part 1 I'll focus on tests that seem to show that ZFS vols seem to be unreasonably slow, even without any virtualization.
Test 1 is the same as in the last thread, 10G of 4k writes at iodepth=1 with sync=1:
IOPS:
Let's ignore VM and LXC numbers for now, and just look at performance on the host. What I see is that even in the absence of any virtualization, using zvols drops performance in such workloads by a factor of 5! I understand that ZFS is not exactly a speed demon, but that drop sounds excessive (and it gets worse under VMs). I don't see any weird high CPU usage, so doesn't seem to be that.
Now, I am not sure if that's how ZFS results are supposed to look like? Perhaps it is just what it is, and I just need to adjust expectations with respect to ZFS performance.
There is some indication that it might be the case: https://forum.proxmox.com/threads/high-iops-in-host-low-iops-in-vm.145268/
Reported IOPS numbers there are almost the same (they didn't test zvol on host though), so perhaps that's normal, or I should say expected? Just for reference, LVM runs at pretty much host speed under the same test conditions.
Even with sync=0 the test runs almost twice as slow on ZFS comparing to raw partition (or LVM) with sync=1! Random reads mode exhibits similar slowdowns, so it is not restricted to writes.
Bottom line is, is several times slowdown on zvols with certain workloads expected?
Config data:
So overall I am running a bunch of different fio tests, and compare on-host performance with the same tests within VM and LXC container. The focus was on ZFS, but I also did some tests using LVM and thin LVM trying to pinpoint issues. Most fio tests are done using block devices, except for LXC ones that write to a file on the root disk (backed by such block device). I was unable to pass block device directly to a container, though I did not try very hard. Tests are done on Intel P4510 NVMe SSD (4TB).
So in part 1 I'll focus on tests that seem to show that ZFS vols seem to be unreasonably slow, even without any virtualization.
Test 1 is the same as in the last thread, 10G of 4k writes at iodepth=1 with sync=1:
IOPS:
raw partition, host | 84,000 |
zvol, host | 18,000 |
zvol, VM | 6,800 |
zvol, LXC | 18,000 |
LVM, host | 84,000 |
zvol, host, sync=0 | 48,000 |
Code:
fio --filename=/dev/zvol/b-zmirror1-nvme01/vm-101-disk-0 --ioengine=libaio --loops=1 --size=10G --time_based --runtime=60 --group_reporting --stonewall --name=cc1 --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 --numjobs=1 --sync=1
Let's ignore VM and LXC numbers for now, and just look at performance on the host. What I see is that even in the absence of any virtualization, using zvols drops performance in such workloads by a factor of 5! I understand that ZFS is not exactly a speed demon, but that drop sounds excessive (and it gets worse under VMs). I don't see any weird high CPU usage, so doesn't seem to be that.
Now, I am not sure if that's how ZFS results are supposed to look like? Perhaps it is just what it is, and I just need to adjust expectations with respect to ZFS performance.
There is some indication that it might be the case: https://forum.proxmox.com/threads/high-iops-in-host-low-iops-in-vm.145268/
Reported IOPS numbers there are almost the same (they didn't test zvol on host though), so perhaps that's normal, or I should say expected? Just for reference, LVM runs at pretty much host speed under the same test conditions.
Even with sync=0 the test runs almost twice as slow on ZFS comparing to raw partition (or LVM) with sync=1! Random reads mode exhibits similar slowdowns, so it is not restricted to writes.
Bottom line is, is several times slowdown on zvols with certain workloads expected?
Config data:
VM uname -a (Debian 12 live):
LXC uname -a (Debian 12):
pveversion -v
ZFS properties:
cat /etc/pve/qemu-server/101.conf
cat /etc/pve/lxc/105.conf
Code:
Linux fiotest 6.1.0-29-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.123-1 (2025-01-02) x86_64 GNU/Linux
LXC uname -a (Debian 12):
Code:
Linux fiopct 6.8.12-10-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-10 (2025-04-18T07:39Z) x86_64 GNU/Linux
pveversion -v
Code:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-10-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8: 6.8.12-10
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph: 19.2.1-pve3
ceph-fuse: 19.2.1-pve3
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
ZFS properties:
Code:
Exceeded message limit, but nothing overly exciting here: sync=standard, compression=on
cat /etc/pve/qemu-server/101.conf
Code:
agent: 1
boot: order=ide2;net0
cores: 4
cpu: x86-64-v2-AES
ide2: local:iso/debian-live-12.9.0-amd64-standard.iso,media=cdrom,size=1499968K
memory: 2048
meta: creation-qemu=9.0.2,ctime=1740711583
name: fiotest
net0: virtio=BC:24:11:0F:05:39,bridge=vmbr0,firewall=1,tag=11
numa: 0
ostype: l26
scsi0: b-lvm-thk-nvme01:vm-101-disk-0,aio=io_uring,backup=0,cache=none,discard=on,iothread=1,replicate=0,size=11G,ssd=1
scsi1: b-zmirror1-nvme01:vm-101-disk-0,aio=io_uring,backup=0,cache=none,discard=on,iothread=1,replicate=0,size=11G,ssd=1
scsi2: ztest:vm-101-disk-0,aio=io_uring,backup=0,cache=none,discard=on,iothread=1,replicate=0,size=111G,ssd=1
scsi3: b-lvm-thn-nvme01:vm-101-disk-0,aio=io_uring,backup=0,cache=none,discard=on,iothread=1,replicate=0,size=111G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=7e99887f-d252-43b2-9d91-7027cb2f84c8
sockets: 1
vmgenid: 7b4eb2f9-ab24-4264-ac43-d9602341249b
cat /etc/pve/lxc/105.conf
Code:
arch: amd64
cores: 4
features: nesting=1
hostname: fiopct
memory: 2048
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=BC:24:11:7D:CF:EC,ip=dhcp,tag=11,type=veth
ostype: debian
rootfs: b-zmirror1-nvme01:subvol-105-disk-0,mountoptions=discard;lazytime,size=120G
swap: 0
unprivileged: 1