[Feature Request] Proxmox 9.0 - iothread-vq-mapping

dominiaz

Renowned Member
Sep 16, 2016
42
3
73
38
https://blogs.oracle.com/linux/post/virtioblk-using-iothread-vq-mapping

@bund69 proxmox tests:

Code:
args: -object iothread,id=iothread0 -object iothread,id=iothread1 -object iothread,id=iothread2 -object iothread,id=iothread3 -object iothread,id=iothread4 -object iothread,id=iothread5 -object iothread,id=iothread6 -object iothread,id=iothread7 -object iothread,id=iothread8 -object iothread,id=iothread9 -object iothread,id=iothread10 -object iothread,id=iothread11 -object iothread,id=iothread12 -object iothread,id=iothread13 -object iothread,id=iothread14 -object iothread,id=iothread15
-drive file=/mnt/pmem0fs/images/103/vm-103-disk-1.raw,if=none,id=drive-virtio1,aio=io_uring,format=raw,cache=none
--device '{"driver":"virtio-blk-pci","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"},{"iothread":"iothread2"},{"iothread":"iothread3"},{"iothread":"iothread4"},{"iothread":"iothread5"},{"iothread":"iothread6"},{"iothread":"iothread7"},{"iothread":"iothread8"},{"iothread":"iothread9"},{"iothread":"iothread10"},{"iothread":"iothread11"},{"iothread":"iothread12"},{"iothread":"iothread13"},{"iothread":"iothread14"},{"iothread":"iothread15"}],"drive":"drive-virtio1","queue-size":1024,"config-wce":false}'

Code:
~150K iops, 600MB/s with 1 io thread, fedora VM
~1000k iops, 4200MB/s with 16 io threads and iothread-vq-mapping,  fedora VM
~3300k iops 12,800MB/s  on proxmox host

 fio bs=4k, iodepth=128, numjobs=30, 40vcpu's
dl380 gen 10 with 2x 6230 and 2x 256GB optane dimms

My own iothread-vq-mapping tests with proxmox are pretty similar to this.

5 TIMES iops improvment on NVME! Please add iothread-vq-mapping to new proxmox 9.0.
 
  • Like
Reactions: Domino and nielsh
I would raise a point, has there been any investigations into the impact such a configuration would have on other guests running on the same host?, one of the biggest problems with virtualisation is the management of threads in order to mitigate latency issues, the more IO threads, the more interrupt requests, and the cost of such configurations could be fairly substantial in hosts running multiple VMs.

Indeed in the case of a couple of VMs, this should be just fine, so in the case of a typical home-setting workload where one has opted to virtualise their gaming-rig with pinned-cores along with reserved large hugepages, happy days... but for those who are running substantial numbers of guests (arguably the 'paying' customer base of Proxmox), the overall latency impact per VM may not be worth the IOPs gain in one VM.

Definitely needs investigating...

There are indeed quite a fair few optimizations which are possible to enhance the performance of a guest, but many such tweaks come at the cost of impacting the 'fair-usage-policy' of a large VM host. Unfortunately many such optimizations are rarely entertained in enterprise workloads so official benchmarks/whitepapers are virtually non-existent.

I've noted across countless such articles/blog-posts/example-configs, from AMD to Oracle, where the point of the task is to just achieve peak performance for the given task, without any references regarding impact to other guests and services running on the same host. As such, many of these kind of tweaks are deferred to strictly isolated cases where performance numbers are critical, mostly demonstration-cases, let us not forget "if you desire absolute performance, you purchase another server, baremetal isn't going anywhere, sharing is caring in the world of enterprise virtualisation".


Putting aside the above curiosities, bringing this new feature to Proxmox will be greatly appreciated for those seeking such IOPs acceleration, the more options the merrier.
 
Last edited:
I would raise a point, has there been any investigations into the impact such a configuration would have on other guests running on the same host?, one of the biggest problems with virtualisation is the management of threads in order to mitigate latency issues, the more IO threads, the more interrupt requests, and the cost of such configurations could be fairly substantial in hosts running multiple VMs.

Indeed in the case of a couple of VMs, this should be just fine, so in the case of a typical home-setting workload where one has opted to virtualise their gaming-rig with pinned-cores along with reserved large hugepages, happy days... but for those who are running substantial numbers of guests (arguably the 'paying' customer base of Proxmox), the overall latency impact per VM may not be worth the IOPs gain in one VM.

Definitely needs investigating...

There are indeed quite a fair few optimizations which are possible to enhance the performance of a guest, but many such tweaks come at the cost of impacting the 'fair-usage-policy' of a large VM host. Unfortunately many such optimizations are rarely entertained in enterprise workloads so official benchmarks/whitepapers are virtually non-existent.

I've noted across countless such articles/blog-posts/example-configs, from AMD to Oracle, where the point of the task is to just achieve peak performance for the given task, without any references regarding impact to other guests and services running on the same host. As such, many of these kind of tweaks are deferred to strictly isolated cases where performance numbers are critical, mostly demonstration-cases, let us not forget "if you desire absolute performance, you purchase another server, baremetal isn't going anywhere, sharing is caring in the world of enterprise virtualisation".


Putting aside the above curiosities, bringing this new feature to Proxmox will be greatly appreciated for those seeking such IOPs acceleration, the more options the merrier.
You have bad point of view. I have fast storage with 8 mln IOPS, so normal 150-200k IOPS per VM (with actual 1x iothread) is very, very bad. If you want sell 200k IOPS or 1mln IOPS then you sell it in fair price. You can always set 1-64 Iothreads per VM, but we need to have a choice.