Continuing testing of disk performance (see previous parts at https://forum.proxmox.com/threads/zfs-vm-lxc-disk-performance-benchmarking-part-1-zfs-slow.166701/ and https://forum.proxmox.com/threads/z...nchmarking-part-2-zfs-in-vm-very-slow.166705/), so won't repeat the background.
I saved best for last, and in part 3 I'll focus on tests in LXC that I simply cannot explain. So here we go
Test 1 is the same as in the last thread, 10G of 4k writes at iodepth=1 with sync=1:
IOPS:
OK, so sync=1 with zvol in LXC is still very slow, as was already mentioned. LVM in LXC is doing much better, with some reasonable numbers. But LXC on zvol with sync=0 shows performance that is almost double of the raw performance on the host. WTH?! Not exactly realistic, so it seems some kind of caching is in play here (I assume), but it seems that it only manifests itself in LXC - not on host and not under VM.
But if that would be just extra positive performance under unsafe conditions (no sync), I wouldn't be mad. But some other tests show massive _declines_ in performance under LXC only, and not under VM or on host directly.
Test 2 is similar to test 1, but it is random 4k access and iodepth is now 64 instead of 1:
This is where it gets really weird:
IOPS:
So more iodepth generates more performance, LVM on host is almost as good as raw, and zvol on host is still notably worse - but comparatively better than test 1, with "only" about 3x deterioration. Interestingly, sync on or off no longer impacts performance at all. LVM on host has very little degradation, and even under VM is pretty decent, less than 2x degradation. zvol under VM did worse than on host, but not as dramatic as in part 2, so also kinda expected.
So far, so good.
So LVM under LXC did a little bit worse than under VM, which is a bit surprising, but it's close enough that we call it even. However, zvol under LXC in this test makes no sense to me. Barely 3k IOPS, when host can do 255k raw?? That's 85x times slower! This can't be right, something is off here. It's still almost 20x times slower than even same zvol under VM. I have similar results for random read workloads, so perhaps the issue is with deep iodepths? But iodepth=1 test 1 shows other weird patterns, see above.
I am not quite sure what to make out of these results. It's somewhat similar to the issues in my previous thread (https://forum.proxmox.com/threads/sync-writes-to-zfs-zvol-disk-are-not-sync-under-pve.163066/) which turned out to be an actual regression in ZFS code that has since been fixed. Perhaps these results point to something similar? Or is there another explanation? I am quite perplexed.
Config data:
I saved best for last, and in part 3 I'll focus on tests in LXC that I simply cannot explain. So here we go

Test 1 is the same as in the last thread, 10G of 4k writes at iodepth=1 with sync=1:
IOPS:
raw partition, host | 84,000 |
zvol, host | 18,000 |
zvol, VM | 6,800 |
zvol, VM, sync=0 | 10,000 |
zvol, LXC | 18,000 |
zvol, LXC, sync=0 | 145,000 |
LVM, LXC | 55,000 |
Code:
fio --filename=/dev/zvol/b-zmirror1-nvme01/vm-101-disk-0 --ioengine=libaio --loops=1 --size=10G --time_based --runtime=60 --group_reporting --stonewall --name=cc1 --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 --numjobs=1 --sync=1
OK, so sync=1 with zvol in LXC is still very slow, as was already mentioned. LVM in LXC is doing much better, with some reasonable numbers. But LXC on zvol with sync=0 shows performance that is almost double of the raw performance on the host. WTH?! Not exactly realistic, so it seems some kind of caching is in play here (I assume), but it seems that it only manifests itself in LXC - not on host and not under VM.
But if that would be just extra positive performance under unsafe conditions (no sync), I wouldn't be mad. But some other tests show massive _declines_ in performance under LXC only, and not under VM or on host directly.
Test 2 is similar to test 1, but it is random 4k access and iodepth is now 64 instead of 1:
Code:
fio --filename=/dev/zvol/b-zmirror1-nvme01/vm-101-disk-0 --ioengine=libaio --loops=1 --size=10G --time_based --runtime=60 --group_reporting --stonewall --name=cc1 --description="CC1" --rw=randwrite --bs=4k --direct=1 --iodepth=64 --numjobs=1 --sync=1
This is where it gets really weird:
IOPS:
raw partition, host | 255,000 |
zvol, host | 82,000 |
zvol, host, sync=0 | 82,000 |
LVM, host | 226,000 |
LVM, VM | 140,000 |
zvol, VM | 64,000 |
LVM, LXC | 130,000 |
zvol, LXC | 3,000 ?! |
So more iodepth generates more performance, LVM on host is almost as good as raw, and zvol on host is still notably worse - but comparatively better than test 1, with "only" about 3x deterioration. Interestingly, sync on or off no longer impacts performance at all. LVM on host has very little degradation, and even under VM is pretty decent, less than 2x degradation. zvol under VM did worse than on host, but not as dramatic as in part 2, so also kinda expected.
So far, so good.
So LVM under LXC did a little bit worse than under VM, which is a bit surprising, but it's close enough that we call it even. However, zvol under LXC in this test makes no sense to me. Barely 3k IOPS, when host can do 255k raw?? That's 85x times slower! This can't be right, something is off here. It's still almost 20x times slower than even same zvol under VM. I have similar results for random read workloads, so perhaps the issue is with deep iodepths? But iodepth=1 test 1 shows other weird patterns, see above.
I am not quite sure what to make out of these results. It's somewhat similar to the issues in my previous thread (https://forum.proxmox.com/threads/sync-writes-to-zfs-zvol-disk-are-not-sync-under-pve.163066/) which turned out to be an actual regression in ZFS code that has since been fixed. Perhaps these results point to something similar? Or is there another explanation? I am quite perplexed.
Config data:
VM uname -a (Debian 12 live):
LXC uname -a (Debian 12):
pveversion -v
ZFS properties:
cat /etc/pve/qemu-server/101.conf
cat /etc/pve/lxc/105.conf
Code:
Linux fiotest 6.1.0-29-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.123-1 (2025-01-02) x86_64 GNU/Linux
LXC uname -a (Debian 12):
Code:
Linux fiopct 6.8.12-10-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-10 (2025-04-18T07:39Z) x86_64 GNU/Linux
pveversion -v
Code:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-10-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8: 6.8.12-10
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph: 19.2.1-pve3
ceph-fuse: 19.2.1-pve3
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
ZFS properties:
Code:
Exceeded message limit, but nothing overly exciting here: sync=standard, compression=on
cat /etc/pve/qemu-server/101.conf
Code:
agent: 1
boot: order=ide2;net0
cores: 4
cpu: x86-64-v2-AES
ide2: local:iso/debian-live-12.9.0-amd64-standard.iso,media=cdrom,size=1499968K
memory: 2048
meta: creation-qemu=9.0.2,ctime=1740711583
name: fiotest
net0: virtio=BC:24:11:0F:05:39,bridge=vmbr0,firewall=1,tag=11
numa: 0
ostype: l26
scsi0: b-lvm-thk-nvme01:vm-101-disk-0,aio=io_uring,backup=0,cache=none,discard=on,iothread=1,replicate=0,size=11G,ssd=1
scsi1: b-zmirror1-nvme01:vm-101-disk-0,aio=io_uring,backup=0,cache=none,discard=on,iothread=1,replicate=0,size=11G,ssd=1
scsi2: ztest:vm-101-disk-0,aio=io_uring,backup=0,cache=none,discard=on,iothread=1,replicate=0,size=111G,ssd=1
scsi3: b-lvm-thn-nvme01:vm-101-disk-0,aio=io_uring,backup=0,cache=none,discard=on,iothread=1,replicate=0,size=111G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=7e99887f-d252-43b2-9d91-7027cb2f84c8
sockets: 1
vmgenid: 7b4eb2f9-ab24-4264-ac43-d9602341249b
cat /etc/pve/lxc/105.conf
Code:
arch: amd64
cores: 4
features: nesting=1
hostname: fiopct
memory: 2048
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=BC:24:11:7D:CF:EC,ip=dhcp,tag=11,type=veth
ostype: debian
rootfs: b-zmirror1-nvme01:subvol-105-disk-0,mountoptions=discard;lazytime,size=120G
swap: 0
unprivileged: 1