zfs 2.3.0

The pve-devel mailing list is a great mailing list to subscribe to if you want to observe the development process. Things are sometimes discussed there well before they get to the Forum.

Latest news on ZFS 2.3 here, with some additional discussion: https://forum.proxmox.com/threads/z...w-long-until-its-available.160639/post-772358


So, we'll most likely be seeing ZFS 2.3 in PVE 9 (based on Debian 13 "Trixie"). From reading the developer mailing list, this was more than just a drop in replacement of the ZFS module, but also involved other elements of PVE that had to be updated to work with it.

Also, from what I can see the testing repos are still based on Debian 12, so it makes sense that ZFS 2.3 isn't there if it's going to be in a PVE release based on Debian 13.
Thank you very much for that great reply - this explains a lot! If you had to guess: When would you approx. expect zfs 2.3 arriving in regular and in the test repo?
 
Last edited:
Thank you very much for that great reply - this explains a lot! If you had to guess: When would you approx. expect zfs 2.3 arriving in regular and in the test repo?
Debian 13 ("Trixie") is now frozen for release preparation, and expected to launch sometime this summer (June/July). At some point after that, Proxmox 9 will be released to the regular repos, containing Debian Trixie and ZFS 2.3.

I can't remember the PVE 7 to 8 transition well enough to know if PVE 9 will hit the test repo before being released, or just be released.

So, if I had to guess, late summer?
 
Debian 13 ("Trixie") is now frozen for release preparation, and expected to launch sometime this summer (June/July). At some point after that, Proxmox 9 will be released to the regular repos, containing Debian Trixie and ZFS 2.3.

I can't remember the PVE 7 to 8 transition well enough to know if PVE 9 will hit the test repo before being released, or just be released.

So, if I had to guess, late summer?
Thanks a lot! :) Will I be able to do selective pinning with just the zfs part or do I need to migrate to the whole test branch? How stable have such test branches been in the past regarding zfs? I just have a homelab so it is more about data loss than speed or so.
 
Thanks a lot! :) Will I be able to do selective pinning with just the zfs part or do I need to migrate to the whole test branch? How stable have such test branches been in the past regarding zfs? I just have a homelab so it is more about data loss than speed or so.
Think of the PVE8 to PVE9 transition like a major OS upgrade--e.g., from Windows 10 to Windows 11. You won't be able to do a partial upgrade to keep PVE 8 and get ZFS 2.3.

I don't really have any additional info on the stability of ZFS in the testing branches.

I assume you're on the no-subscription branch when you're not using the testing branch? If so, I'd just wait for PVE 9 to hit, make sure you've backed up all your data, and then do the full system update. If you want to be a bit more cautious, wait for PVE 9.0.1 to get the initial bugfix release.
 
ZFS already has effective ARC bypassing configuration, so I feel the benefits of that o_direct stuff is limited on a practical level. I guess if one is lazy with data set creation its useful (they create datasets akin to legacy file systems such as mysql shared with standard /home).
 
  • Like
Reactions: Johannes S
Think of the PVE8 to PVE9 transition like a major OS upgrade--e.g., from Windows 10 to Windows 11. You won't be able to do a partial upgrade to keep PVE 8 and get ZFS 2.3.

I don't really have any additional info on the stability of ZFS in the testing branches.

I assume you're on the no-subscription branch when you're not using the testing branch? If so, I'd just wait for PVE 9 to hit, make sure you've backed up all your data, and then do the full system update. If you want to be a bit more cautious, wait for PVE 9.0.1 to get the initial bugfix release.
Thank you for your help! :) So in General selective pinning is possible but not with a major OS upgrade as in this case?

Yes I am on the no subscription branch - I did not really get if you meant I should update to the test branch or the no subscription branch... How long would you guess it additionally takes until it goes from testing to no subscription?
 
ZFS already has effective ARC bypassing configuration, so I feel the benefits of that o_direct stuff is limited on a practical level. I guess if one is lazy with data set creation its useful (they create datasets akin to legacy file systems such as mysql shared with standard /home).
Thanks for the input, but my concern is actually about something else than the feature you mentioned.

At the moment, I'm running TrueNAS virtualized under Proxmox without PCIe passthrough, as my older consumer-grade motherboard doesn't support it. Unfortunately, the VM crashes intermittently without a clear pattern, which has made the setup unreliable.

To complicate matters, I've already upgraded my ZFS pool to version 2.3 due to the pool expansion feature I needed to add another HDD —which TrueNAS has supported since September 2024. Because of this, I can't migrate the pool back to Proxmox until ZFS 2.3 is officially supported there. Once that's the case, my plan is to set up a lightweight VM just for SMB sharing, since that's really the only functionality I currently rely on from TrueNAS.
 
Yes I am on the no subscription branch - I did not really get if you meant I should update to the test branch or the no subscription branch... How long would you guess it additionally takes until it goes from testing to no subscription?

Stay on no-subscription. That's the branch you should be on if you don't have a license.

It's a bit confusing because the no-subscription branch is used to test updates before they go into the Enterprise repo (those of us running the no-subscription repos are the testers), but it's not the Testing repo--the Testing repo is the most bleeding edge stuff, meant for software developers and other people doing debugging on the newest thing.

tl;dr Stay on no-subscription, you'll get access to PVE 9 probably before the people in the Enterprise repo do.
 
Stay on no-subscription. That's the branch you should be on if you don't have a license.

It's a bit confusing because the no-subscription branch is used to test updates before they go into the Enterprise repo (those of us running the no-subscription repos are the testers), but it's not the Testing repo--the Testing repo is the most bleeding edge stuff, meant for software developers and other people doing debugging on the newest thing.

tl;dr Stay on no-subscription, you'll get access to PVE 9 probably before the people in the Enterprise repo do.
Okey, thanks! :) How long does it usually take after testing → no subscription?
 
Cross posting, I'm trying to figure out how to enable/disable this, I'm not seeing a difference from 3 different ways of testing disk IO.

I saw this thread months ago and tested out ZFS direct IO on Debian 12 with the latest packages, and I saw large differences with fio - sometimes 4x faster, sometimes 30% as fast, depending on things like block sizes, so I figured I'd wait for the PVE maintainers settings, but I'm setting it and I see no change.
 
  • Like
Reactions: smalltrex
Just to update anyone else, I suspect the settings are baked into PVE and it uses the default mixed use, because on 2 different servers with the same specs I see a 1.5x speedup with RAID10 ZFS on 8 drives with 8 TB Samsung PM1743s with:
rm /dev/zvol/rpool/data/test.file
fio --filename=/dev/zvol/rpool/data/test.file --name=sync_randrw --rw=randrw --bs=4M --direct=0 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --refill_buffers --size=8G --loops=2 --group_reporting

About 6000 MiB/s read and write on PVE9, and 3800 MiB/s read/write on PVE8.

Or it is some other optimization creating this illusion, either way I'm happy disk IO is a bit faster.
 
Just to update anyone else, I suspect the settings are baked into PVE and it uses the default mixed use, because on 2 different servers with the same specs I see a 1.5x speedup with RAID10 ZFS on 8 drives with 8 TB Samsung PM1743s with:
rm /dev/zvol/rpool/data/test.file
fio --filename=/dev/zvol/rpool/data/test.file --name=sync_randrw --rw=randrw --bs=4M --direct=0 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --refill_buffers --size=8G --loops=2 --group_reporting

About 6000 MiB/s read and write on PVE9, and 3800 MiB/s read/write on PVE8.

Or it is some other optimization creating this illusion, either way I'm happy disk IO is a bit faster.
Just to clarify: You saw that speed increase just from moving from PVE8 to PVE9? You didn't have to manually configure DirectIO?

(This wouldn't surprise me; my understanding is that by default it is supposed to kick in automatically when a workload asks for it, like the default sync setting that adjusts behavior based on workload. NVMEs in particular are supposed to really benefit from DirectIO. Nice to see it (apparently?) working as intended.)
 
Just to clarify: You saw that speed increase just from moving from PVE8 to PVE9? You didn't have to manually configure DirectIO?

(This wouldn't surprise me; my understanding is that by default it is supposed to kick in automatically when a workload asks for it, like the default sync setting that adjusts behavior based on workload. NVMEs in particular are supposed to really benefit from DirectIO. Nice to see it (apparently?) working as intended.)
Yes. Both hosts were nearly idle disk IO wise, I shutdown all Windows VMs. Our Linux VMs are just using CPU lightly for some FPGA compiles. I noticed that just having 40 Windows VMs idle reduces the benchmarked rate so I tried to be controlled here.

I don't have my notes, but I seem to recall testing out ZFS Direct IO with these benchmarks on Debian 12 a few months ago with the latest packages and I could change the setting and there were massive changes. Sometimes 30% as fast, sometimes 4x faster depending on the block size. In PVE, it's within 0.2%, just noise.

Either some other change is creating illusory correlation independent of direct IO, or it is setting direct IO mixed usage and ignoring the direct setting for the zfs pool.
 
Interesting.

I think it might be worthwhile to make a separate thread to gather data about the performance differences between PVE 8 and 9/ZFS 2.2 and 2.3.

I'm really interested in this, but it's pretty off topic for an "is ZFS 2.3 part of PVE yet?" thread. That will make this discussion harder for people to find if it's not broken out into its own thread.

EDIT:
rm /dev/zvol/rpool/data/test.file
fio --filename=/dev/zvol/rpool/data/test.file --name=sync_randrw --rw=randrw --bs=4M --direct=0 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --refill_buffers --size=8G --loops=2 --group_reporting

About 6000 MiB/s read and write on PVE9, and 3800 MiB/s read/write on PVE8.

Or it is some other optimization creating this illusion, either way I'm happy disk IO is a bit faster.

I'd really like to know if that's DirectIO or the kernel or something else. A 63 percent speedup from PVE8 to PVE9 just using the default configuration is wild.
 
Last edited:
It could be due to the kernel, hard to say. I don't have any PVE8 hosts online with those specs.

Of note is that our host chipset is AMD EPYC 9005 - dual socket 9575F specifically which is probably the best released chip for the majority of workloads today as it runs 4.5-5.0 GHz with 128 cores with 2 sockets unless you're doing all AI compilations or something that triggers high thermals like modern AVX instructions.

PVE8 used kernel 6.8. I opted into the 6.14 kernel on the 2 hosts I've since upgraded to PVE9 because I'm trying to root cause why Windows VMs freeze about 2x per year on Windows but only on AMD chipsets and currently it looks like Windows itself is at fault. I was hoping the kernel update would help. I did notice our metrics that gather CPU frequency finally started working for AMD so I can confirm CPU behavior and indeed these chips are fast.

So it is entirely plausible the kernel update bumped us alone because kernel 6.14 should be fully aware of this CPU, NUMA topology, and optimized for it.

Someone ping me if my assumptions are off and ZFS on PVE has an entirely different way of enabling. I'm not here to be right, I'm here to find useful and true information and being wrong is part of that.
 
Take 2 same hw hosts, install trixie on both eg with ext4 and check if both hosts are quiet fast the same in cpu and I/O.
Then on one install zfs 2.2.7/2.2.8 while on the other 2.3.3. Do some zfs I/O testing to compare
 
  • Like
Reactions: SInisterPisces