nvme IOPS 4k performance - same disk with diffrent Host = diffrent results.

dominiaz

Renowned Member
Sep 16, 2016
42
3
73
38
Host1: Epyc 7702P with 512GB RAM, Micron 7400 Pro 1,92 TB
Host2: Core Ultra 7 265k with 128GB RAM, Micron 7400 Pro 1,92 TB

Micron 7400 Pro 1,92TB mounted as DIR with XFS filesystem on Host. I have use fstrim -av before makeing tests.

VM config:
Code:
virtio0: local-u2-micron-1:2105/vm-2105-disk-0.raw,aio=native,cache=directsync,discard=on,iothread=1,size=32G

Host1 (Epyc 7702P):
Code:
fio --ioengine=libaio --direct=1 --rw=randread --bs=4k --numjobs=32 --iodepth=32 --runtime=60 --time_based --name=rand_read --filename=/mnt/pve/local-u2-micron-1/fio.4k --size=1G
735k IOPS

VM Guest on Host1:
Code:
fio --ioengine=libaio --direct=1 --rw=randread --bs=4k --numjobs=32 --iodepth=32 --runtime=60 --time_based --name=rand_read --filename=/tmp/fio.4k --size=1G
192k IOPS

Host2 (Core Ultra 7 265k):
Code:
fio --ioengine=libaio --direct=1 --rw=randread --bs=4k --numjobs=32 --iodepth=32 --runtime=60 --time_based --name=rand_read --filename=/mnt/pve/local-u2-micron-1/fio.4k --size=1G
745k IOPS

VM Guest on Host2:
Code:
fio --ioengine=libaio --direct=1 --rw=randread --bs=4k --numjobs=32 --iodepth=32 --runtime=60 --time_based --name=rand_read --filename=/tmp/fio.4k --size=1G
495k IOPS

VM guest on Epyc 7702P: 192k IOPS
VM guest on Core Ultra 7 265k: 495k IOPS


Why the same NVME Micron 7400 Pro 1,92 on diffrent Hosts has difrrent results inside VM?
 
VM guest on Epyc 7702P: 192k IOPS
VM guest on Core Ultra 7 265k: 495k IOPS


Why the same NVME Micron 7400 Pro 1,92 on diffrent Hosts has difrrent results inside VM?

This comes down to how the hypervisor and guests establish paravirtualization links via KVM/HV enlightenments.

Unfortunately, Qemu/KVM and even Proxmox as a UI (though it is getting better, just too many configurations to stay on top of so understandably a dilemma) - they are all a bit of a real mess because they won't configure everything optimally for every system, ie. CPU and guest, let alone the device(s) themselves and countless combinations of firmwares across devices and boards and even right down to cpu microcode updates, and then of course there are also the guest distros, all over the place indeed, and they do require kernel commandline overrides, which again vary from distro to distro to host to CPU to Qemu version to Kernel version... ufff... honestly, it is why Microshaft's HV/Azure and countless other big-boy cloud providers are so attractive these days - why? all the optimizations have already been done and configured to perfection depending on workload use-cases - you just pay a fee and sit back and relax lol

Now you did not share if you are doing NVME passthrough or not, or if you're using Virtio-fs or not, so I won't protract down a path that is not relevant.

However, I can assure you, 9 times out of 10, its not the technology, it is the user, and the lack of automation, meaning you have to spend hours to days scouring the web across github and god knows what dark corners of the internet's world of blogs, just to find some little 'flag' that makes all the difference in your particular configuration.

Bottom line, the answer to your question "why" - is... test and trial and research it yourself, when you use Qemu/KVM/Linux in general, you are taking onboard the responsibility of wasting countless hours of your life on something that you will probably throw away and never need again, and then you will realise that you are never getting that time back, once its gone, its gone for good.

There is no RTFM moment here, because there is no manual that fits all configurations, it is a shit-show on a massive scale, it is what makes Linux as powerful as it is as well as the biggest headache it can be too... best of luck ;)
 
Last edited:
VM guest on Epyc 7702P: 192k IOPS
VM guest on Core Ultra 7 265k: 495k IOPS


Why the same NVME Micron 7400 Pro 1,92 on diffrent Hosts has difrrent results inside VM?
265K is last intel CPU (late 2024) where Single Core raw compute is about 2,5 times the Epyc 7702 Single Core raw compute (late 2019).
Epyc is slower with only 1 VM , but do the same test with 32 concurrent VM
 
Last edited:
265K is last intel CPU (late 2024) where Single Core raw compute is about 2,5 times the Epyc 7702 Single Core raw compute (late 2019).
Epyc is slower with only 1 VM , but do the same test with 32 concurrent VM

Yes, obviously correct, however...

You didn't read his post correctly. There is no issue baremetal, IOPs are fine on both cpus... the issue he is noticing is the IOPs performance inside VMs.

Many of us have spent crazy hours to days to weeks tweaking things in the past across numerous builds, ended up being relative to a range of things from crazy high VM-exits to making sure the correct virtual apic paravirtualisation was active, to latencies being introduced because things weren't running on primary cores, to countless threading issues, to stupid bugs in specific kernel versions with missing patches etc etc etc.
 
AFAIK, vDisk is currently limited to one qemu thread, which is limited to one cpu thread.
 
  • Like
Reactions: Domino