[SOLVED] Expose L3 cache to VM guest

May 2, 2018
10
0
6
Hi, I'm approaching this from a Windows gaming VM point of view and while everything is working well, I'm attempting to do some further tuning to bring latency down as much as possible.

I'm a little more familiar with libvirt in this regard, but how would I implement the equivalent of passing the libvirt command below in Proxmox's VM arguments to allow passthrough of real CPU cache?

<cpu mode="host-passthrough" check="none"> <topology sockets="1" cores="8" threads="2"/> <cache mode='passthrough'/> <feature policy='require' name='topoext'/> </cpu>

From what I've read, passthrough of L3 cache results in measurable improvements in latency. I figured out that I could use 'l3-cache=on' but I'm unsure whether this is emulated cache or real cache.

My current VM config

Code:
##These comments are for the hookscript
##CPU pinning
#cpu_taskset 4,16,5,17,6,18,7,19,8,20,9,21,10,22,11,23
##Assigning vfio interrupts to VM cores
#assign_interrupts 4,16,5,17,6,18,7,19,8,20,9,21,10,22,11,23 vfio
##Set halt_poll_ns
#set_halt_poll 0

agent: 1
args: -smp '16,sockets=1,cores=8,threads=2,maxcpus=16' -cpu 'host,topoext=on,l3-cache=on,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+pdpe1gb'
balloon: 0
bios: ovmf
boot: order=hostpci0
cores: 16
cpu: host,hidden=1
efidisk0: local-lvm:vm-101-disk-1,size=4M
hookscript: local:snippets/exec-cmds
hostpci0: 04:00.0,pcie=1
hostpci1: 0c:00.0;0c:00.1;0c:00.2;0c:00.3,pcie=1,x-vga=1,romfile=gigabyte-rtx2080-super-gaming-oc-modded.rom
hostpci2: 09:00.1;09:00.3,pcie=1
hostpci3: 08:00.0,pcie=1
hostpci4: 06:00.0,pcie=1
hugepages: 1024
machine: q35
memory: 16384
name: gaming
numa: 1
onboot: 1
ostype: win10
parent: working
scsihw: virtio-scsi-single
smbios1: uuid=b2f76d09-acba-4389-abdf-a5f40c39d445
sockets: 1
startup: order=3
tablet: 0
vga: none
vmgenid: b395acc4-fgf3-4f67-b24a-a51bcf6209c3


Host specs:
  • Ryzen 5900X (8c/16t pinned to guest — 4c/8t left for the host and containers)
  • Gigabyte X570 Aorus Master
  • GeForce 2080 Super (passed to guest)
  • 64GB Samsung DDR4 UDIMM ECC RAM (16GB consumed by guest)
  • 1TB Samsung 970 EVO Plus (passed to guest)
  • 480GB Samsung OEM SSD (host)
 

Stefan_R

Proxmox Staff Member
Staff member
Jun 4, 2019
1,025
203
63
Vienna
The QEMU parameter to pass through the exact information would be 'host-cache-info=on'. This passes the cache information from the host. Keep in mind however, that this only makes sense when your guest topology (as mapped to vCPUs) matches your hosts - i.e. on Ryzen, that you always pin full CCXs, with the cores of one CCX always adjacent (i.e. don't interleave vCPUs between CCXs).

For any other layout/topology, 'l3-cache=on' is probably what you want - this doesn't "emulate" the L3 cache (that would be *super* slow), but, as far as I understand it, rewrites the L3 cache info given to the guest to just say "everything is shared". Not optimal, but probably still faster than leaving it at default.

On a Windows guest, I like to use the Coreinfo tool to verify if the layout the guest sees matches what I have on the host.

As always with this stuff, the actual performance depends heavily on your setup - try both options (as well as none at all) and benchmark, test, see what feels right ;)
 
  • Like
Reactions: guletz
May 2, 2018
10
0
6
Thanks Stefan, that's really helpful.

So given that 5900X CCX layout for its 12 cores is 6+6, I'd be better off using host-cache-info=on if I was using 6c/12t topology in the VM.

If using 8c/16t topology in the VM, then use l3-cache=on - is that about right?

Based on the architectural info about Ryzen 5000, each CCX gets its own shared L3 cache pool, therefore it's probably better (theoretically) to not cross between CCXs, but like you said, I'll compare with some benchmarks to see how it goes in reality :)
 

Stefan_R

Proxmox Staff Member
Staff member
Jun 4, 2019
1,025
203
63
Vienna
That would match my understanding of it, yes. Although you might get away with host-cache-info as well, if you make sure the vCPUs on the full CCX come before the ones on the shared one - that way, while the guest may see too much L3 for the latter, it will at least correctly see the separation, i.e. it will correctly know which cores share an L3, which is what's mostly important (for scheduling).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!