LVM-thin vs ZFS for local VM storage on a single-node PVE — pros/cons in 2026?

bridman19

New Member
Jun 18, 2026
2
0
1
Hi everyone,

Setting up a single-node PVE 9.x box for a small office (no cluster, no HA, no
Ceph — just one server with 4x 2TB NVMe in software RAID-equivalent).

I keep going back and forth between two local storage options:

- LVM-thin on top of mdadm RAID10
- ZFS RAID10 (mirror of mirrors)

I've used LVM-thin for years and I'm comfortable with it. ZFS keeps being
recommended everywhere but I'm wary of the RAM appetite and the "don't run ZFS
on consumer NVMe" warnings.

For a workload of ~15 VMs (mostly Linux, a couple of Windows, no databases),
what would you pick today? Anything about ZFS on PVE 8.x that has changed
recently that I should factor in?

Thanks.
 
"don't run ZFS on consumer NVMe" warnings.
should be "don't run anything production critical on consumer drives"

LVM-thin on top of mdadm RAID10
This is not a supported configuration. If you care about supported configuration, ZFS will be the only viable option besides running a hardware raid controller.

Anything about ZFS on PVE 8.x that has changed recently that I should factor in?
nothing for small environments. ZFS is feature complete for the things you need for virtualization for many years.
 
  • Like
Reactions: UdoB
I'm wary of the RAM appetite
That one was always overhyped. Today ZFS works with a very few GB of Ram, capped at 16 GiB on a PVE installation: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage
and the "don't run ZFS
on consumer NVMe" warnings.
This one is true. And @LnxBil hit the nail: "
should be "don't run anything production critical on consumer drives"

Not using ZFS will remove some important features from your system: transparent compression, guaranteed integrity + self healing, technically cheap snapshots and some more.

Yes, I am a ZFS fan: https://forum.proxmox.com/threads/f...y-a-few-disks-should-i-use-zfs-at-all.160037/
 
I would not be comfortable at all with this single server and single array setup for people doing real work. Why not at least get a second system and do ZFS replication? What are you using for backups and how frequently?

Side comment:
Statistically, I believe a single four-drive RAID10 array and two RAID1 arrays have the same probability of data loss, but my opinion is that two RAID1 arrays is a far better choice in reality. (Main exceptions are if you simply cannot fit the data otherwise or absolutely must have the extra single-threaded throughput.)
If one of your two RAID1 arrays can't rebuild, you only have half of your data at risk or lost. You still have an array available for restoring backups and getting up and running quickly. And a drive from a RAID1 array can be read directly in another system or external dock, while RAID10 requires assembling all the available drives at once.
 
  • Like
Reactions: UdoB
Spent the weekend benchmarking both on the actual hardware before committing.
Sharing the results.

Setup: 4x 2TB Samsung PM9A3 (enterprise NVMe, PLP), AMD EPYC, 128GB RAM.

LVM-thin on mdadm RAID10:
- Sequential read: ~12 GB/s
- Sequential write: ~5.5 GB/s
- 4k random read QD32: ~850k IOPS
- 4k random write QD32: ~420k IOPS
- RAM used: ~negligible (page cache only)
- Snapshot: works but COW-on-write penalty is real
- Replication to another node: not native

ZFS RAID10 (mirror of mirrors), default ashift=12, recordsize=16k for VM zvols:
- Sequential read: ~8 GB/s
- Sequential write: ~4 GB/s
- 4k random read QD32: ~620k IOPS
- 4k random write QD32: ~280k IOPS
- RAM used: ~16GB ARC out of the box (tunable)
- Snapshot: instant, free
- Replication to another node: native (zfs send/recv, integrated in PVE)

So LVM-thin wins on raw performance, ZFS wins on features.

What I picked: ZFS.

Reasoning:
- The performance difference is real but on enterprise NVMe both are vastly
faster than the VMs will ever need.
- Snapshots are night-and-day. With LVM-thin, snapshots accumulate write
amplification fast and you can't keep them around. With ZFS you can keep
hourly snapshots for a week without thinking about it.
- Replication: even though this is single-node today, in 12-18 months I'll
probably add a second node and pve-zsync / native replication makes that
trivial. With LVM-thin I'd have to redesign.
- ARC is not a problem on a 128GB box. Cap it at 16GB if you want:
echo "options zfs zfs_arc_max=17179869184" > /etc/modprobe.d/zfs.conf

Things I tuned on ZFS for VM workload:
- ashift=12 (4k sectors — matches NVMe physical)
- compression=lz4 (cheap, helps with sparse VM disks)
- atime=off
- xattr=sa
- recordsize=16k on the zvol parent dataset (matches typical VM IO better
than the 128k default)
- sync=standard (don't set sync=disabled in production no matter how tempting)

About the "ZFS on consumer NVMe" warning: it's real, but it applies to consumer
drives without PLP (power-loss protection) running sync writes. On enterprise
NVMe with PLP you're fine. If you're on consumer drives, either accept the
risk or add a small Optane / enterprise SSD as SLOG.

LVM-thin is still a totally valid choice — especially if you're allergic to ZFS
or you have RAM constraints. But for a fresh build in 2026 with decent
hardware, I'd default to ZFS.
 
  • Like
Reactions: UdoB
recordsize=16k on the zvol parent dataset (matches typical VM IO better
That's for file systems (OS/CT) and shouldn't affect ZVOLs. I guess you meant to set volblocksize=16k (which is default) instead?
That said I'd leave both properties alone/default and set the ZVOL part via Datacenter > Storage if needed.
1782818036331.png
Check this to see what I mean
Bash:
zfs list -ospace,volblocksize,recordsize,type
 
Last edited:
  • Like
Reactions: UdoB
LVM-thin on mdadm RAID10:
[...]
- RAM used: ~negligible (page cache only)
[...]

ZFS RAID10 (mirror of mirrors), default ashift=12, recordsize=16k for VM zvols:
[...]
- RAM used: ~16GB ARC out of the box (tunable)
[...]
FYI: If you mean by negligible that the buffer cache will cache EVERYTHING and fill up, so that no really free ram is there not only available.
Sure, the buffer cache is flushed much faster is memory is required, what is the actual technical difference between ARC and buffer cache.