zfs or pveperf changes or bug? FSYNC dropped by 50%

Mrt12

Well-Known Member
May 19, 2019
144
11
58
44
CH
good day,

unfortunately, I cannot yet with 100% confidence say since when this issue arised. But certainly since the last week (~5th May). There have been a couple updates since then, obviously *something* changed with ZFS or with pveperf or with PVE itself.

What did I observe?

So far I tested my ZFS pool performance frol time to time using

pveperf /tank
pveperf /ssdpool

as I have 2 pools: tank, which has HDDs and a mirrored SLOG, and the ssdpool, which has only 2 mirrored SSDs.

For all SSDs I have I use HGST HUSMM1640, which is a 400GB SAS 12Gb/s SSD.

Until now, I always got around 12000 FSYNCS/sec reliably on both pools, as the SSDs are very fast. This is with no VMs/CTs running, no CPU load, no scrub or trim running and no SMART check running.

Today I again checked pveperf /tank and pveperf /ssdpool, and I can now get no more than ~6600 FSYNCS/s. The thing is, my hardware absolutely didn't change, but I daily checked and installed all updates that were available, and rebooted. I also regularly run once per week

fstrim -a

and

zpool trim tank
zpool trim ssdpool

but so far, this has not affected the performance. Until today, I had reliably my 12K FSYNCs. I have not even switched off my server in the meantime. But now can only get at most ~6K FSYNCs.

The only thing I can say is, some of the updates since 5th May seems to have changed something so that FSYNCs got halved. I wonder what the reason could be and how to fix this. (I know, probably I will not notice the difference. But nevertheless, I think there is something going on when my FSYNCs suddenly drop by 50%).

I have zfs-2.2.3-pve2.
 
You might be on to something. I remember my (new) drive doing 17500 fsync/s but currently it's between 8000 and 10000. I did notice that the CPU speed (PBO on or off) does influence the number. And I now use amd_pstate+schedutil instead of ACPI+performance in the past. Maybe the test is rather sensitive to the CPU (once the number has more than 3 digits)?
 
hmm I notice that I have still

CPU BOGOMIPS 108000

like before, so that didn't change. I have a 12 core 24 Thread Xeon 4310. What CPU do you have and what BOGOMIPS?

At least I am happy to read I am not the only one observing such a thing. However I can really not say for sure when this behaviour changed :-(
 
CPU BOGOMIPS 108000

like before, so that didn't change. I have a 12 core 24 Thread Xeon 4310. What CPU do you have and what BOGOMIPS?
Ryzen 5950X with 217594 (and the drive is a Micron 7450 PRO M.2 960). I don't remember previous values for this bogus meaningless indicator of processor speed.
At least I am happy to read I am not the only one observing such a thing. However I can really not say for sure when this behaviour changed :-(
Maybe it's spectre mitigations, maybe it's a side-effect of some fixes in OpenZFS , maybe the test is not perfect. I don't notice it in normal/home/personal use at all.

EDIT: Looking at /usr/bin/pveperf, BOGOMIPS is just the sum of the Linux kernel bogomips from /proc/cpuinfo.
 
Last edited:
Yeah I also don't care too much about the BOGOMIPS. And of course I also don't notice the difference between 12K FSYNCs and 6K FSYNCs. I am just wondering why this is happening, at least it is an indicator that *something* changed, and I want to know what changed. Is it the calculation of the FSYNCs, which would mean it is not slower at all? or did the ZFS module implement some changes that make it slower? Or what?

my pveperf dropped about 20% as well. my zfs-mirror of sata enterprise ssds used to manage around 6000-6600 fsyncs and now ist just slightly above 5k

Goot to read this from somebody else too, so I am a bit more confirmed I am not seeing ghosts :)
 
this can have multiple causes.
for me for example it can be that my host is under higher io-load than when i last tested (there is one more windows vm on it after all).
i can only test under load, as i only have one host and putting the family offline for such a test will not be met with cheers ;)
so take my results with a pinch of salt.
my old results were under load as well
also my results differ only by about 20% unlike yours which are 50% different.

also filesystem usage may affect speeds.
so maybe test on an empty dataset?

edit: i just tested on an empty dataset. still 5200 fsyncs. i guess that didnt make any difference :)
 
Last edited:
My pool is at the moment ~2.5TB out of 12TB total usage, so I would say this is almost like empty, no?
also previously, when I had 12K FSYNCs, the pool was at the same usage.
Another indicator that this is not related to the pool usage is that, on my SSD only pool, I also had 12K FSYNCs, with that pool being ~50% full.
The SSDs I use for the SLOG are of the same kind like the ones I have for the SSD only pool.

Still, the SSD pool is very fast, a Windows 10 VM boots in under 3 seconds.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!