PVE 8.3.4 EPYC 7713 (Milan) SMT Passthrough Fail (cpu: host -> 1 thread/core) & Guest numactl Fail

vladdc · Mar 29, 2025

Hello Proxmox Community,

I'm encountering issues getting full performance from my dual EPYC 7713 VM on PVE 8.3.4. Specifically, SMT passthrough seems to be failing, and numactl binding within the guest also fails.

Host Hardware:

CPU: 2 x AMD EPYC 7713 (64-Core Processor)
RAM: 1 TB DDR4
Server Model: Gooxi SR201-D12R-NV (Motherboard reported as Gooxi G1DLRO-B, but this seems incorrect as it's single-socket; the system is definitely dual-socket).
Host SMT: Confirmed ON (Host OS sees 256 threads, 2 threads/core - see lscpu below)
Host Microcode: 0x0a001119 (Possibly outdated, SRSO warning in dmesg, waiting for manufacturer BIOS update)

Host Software:

Proxmox VE Version: (Output of pveversion)
pve-manager/8.3.4/65224a0f9cd294a3 (running kernel: 6.8.12-8-pve)
Host lscpu confirms 256 threads / SMT ON: (Key lines)
Architecture: x86_64 CPU(s): 256 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 2 NUMA node0 CPU(s): 0-63,128-191 NUMA node1 CPU(s): 64-127,192-255 Model name: AMD EPYC 7713 64-Core Processor Virtualization: AMD-V
Host numactl --hardware confirms 2 nodes, 256 threads correctly mapped.

Guest VM (ID 101) Configuration: (/etc/pve/qemu-server/101.conf)

Code snippet

bios: ovmf boot: order=scsi0;ide2;net0 cores: 64 cpu: host # efidisk0: ... # ide2: ... machine: q35 memory: 946045 # meta: ... name: Aibase net0: e1000=BC:24:11

9:5B:0F,bridge=vmbr0,tag=20 # Using e1000 temporarily, aware virtio-net is better numa: 1 ostype: l26 # Linux 6.x Kernel scsi0: Datapool:vm-101-disk-1,backup=0,iothread=1,size=6900G scsihw: virtio-scsi-single # smbios1: ... sockets: 2 # vmgenid: ... 
Guest VM OS:

OS: Ubuntu 24.04.2 LTS (Fresh install)
Kernel: 6.8.0-55-generic (uname -r)
Python/apt/netplan issues previously encountered were fixed.

Problem 1: SMT Passthrough Failure

Despite cpu: host and host SMT being ON, the guest VM only sees 1 thread per core:

Guest lscpu:
Thread(s) per core: 1 CPU(s): 128 Socket(s): 2 Core(s) per socket: 64 NUMA node(s): 2 NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127
Guest numactl --hardware: Shows 2 nodes, CPUs 0-63 on node 0, 64-127 on node 1.

Problem 2: numactl Binding Failure in Guest

Attempts to bind processes to a NUMA node using numactl inside the guest fail. The binding options are ignored:

Command run in guest: sudo numactl --cpunodebind=0 --membind=0 sleep 600 &
Result from numactl -s -p <pid>:
policy: default cpubind: 0 1 # <-- Should be just 0 nodebind: 0 1 # <-- Should be just 0 membind: 0 1 # <-- Should be just 0

Troubleshooting Steps Taken:

Confirmed SMT ON in Host OS (lscpu). BIOS setting assumed ON but pending double-check/update.
Using standard recommended VM settings (q35, cpu: host, numa: 1, sockets: 2, cores: 64).
Performed full VM Shutdown/Start after config changes.
Tried explicit cpu: EPYC-Milan -> Same result (128 threads, SMT OFF).
Tried forcing topology via args: -smp 256,... -> Failed VM start due to config conflicts. Reverted.
Updated Proxmox Host (apt update && apt dist-upgrade on PVE 8.3.4) and rebooted -> No change in guest SMT status.
Checked guest kernel parameters (/proc/cmdline) -> No nosmt.
Fixed unrelated Python/apt/netplan issues within the guest.

Question:

Is this a known issue or bug with Proxmox VE 8.3.4 (Kernel 6.8.x) and AMD EPYC 7003 (Milan) CPUs regarding SMT passthrough when using cpu: host? Why might SMT fail to pass through, and why might numactl binding fail within the guest despite the guest seeing the NUMA structure (for the limited threads it detects)?

Are there any specific host kernel parameters, KVM options (args:?), or other settings known to resolve this? I am currently waiting for a potential BIOS/Firmware update from the manufacturer (Gooxi) for the potentially outdated microcode (0x0a001119).

Any insights or suggestions would be greatly appreciated!

MarkusKo · Mar 29, 2025

vladdc said:
Problem 1: SMT Passthrough Failure

Despite cpu: host and host SMT being ON, the guest VM only sees 1 thread per core:

thats normal, a virtual cpu core maps to one thread, there is no SMT / Hyperthreading passthrough, just add an additional virtual core to your vm.

for your second problem, read this:
https://pve.proxmox.com/wiki/NUMA

i dont know much about numa but you can't define this in the guest vm, pve decides what hardware and how the guest is allowed to use it, you should look for a setting in the vm hardware or vm options

vladdc · Mar 29, 2025

Hi MarkusKo,

Thanks for your input on my issues. I wanted to clarify a couple of points based on my understanding and further testing, hoping to get more insight from the community:

1. Regarding SMT Passthrough with cpu: host:

You mentioned: "thats normal, a virtual cpu core maps to one thread, there is no SMT / Hyperthreading passthrough, just add an additional virtual core to your vm."

My understanding, and what seems common practice, is that cpu: host is specifically intended to pass through the host CPU's features as closely as possible, including SMT. When SMT is enabled on the host (which I've confirmed via lscpu on my Proxmox host showing 2 threads/core), the guest VM should typically also see 2 threads per core with cpu: host. For example, other users with similar EPYC Milan CPUs seem to have SMT working in guests with cpu: host.

My specific problem is that, despite using cpu: host, machine: q35, sockets: 2, cores: 64, and numa: 1 on PVE 8.3.4 with an EPYC 7713 host (where SMT is ON), my Ubuntu 24.04 guest consistently only sees 1 thread per core (128 threads total).

Regarding adding cores (e.g., setting cores: 128): I worry this would misrepresent the physical topology and break the guest's NUMA awareness, potentially harming performance more than it helps. My goal is to understand why the expected SMT passthrough with cpu: host is failing in my specific environment.

2. Regarding numactl in the Guest:

You mentioned: "you can't define this in the guest vm, pve decides what hardware and how the guest is allowed to use it..."

While Proxmox certainly defines the virtual topology presented (numa: 1), my understanding is that numactl is the standard tool used within a Linux guest to control how the guest's OS scheduler assigns processes/memory relative to the virtual NUMA nodes it sees. The Proxmox wiki on NUMA also mentions guest awareness.

My issue is that numactl --cpunodebind=X --membind=X commands fail to apply any binding inside my guest (tested with sleep; numactl -s -p <pid> still shows binding to all nodes 0 1), even though numactl --hardware inside the guest does correctly report the 2 virtual NUMA nodes (for the 128 threads it sees). This suggests the virtual NUMA topology presented might not be fully functional for guest control, likely linked to the same root cause as the SMT failure.

Summary of Problem & Setup:

Host: 2x EPYC 7713 (128c/256t total), SMT ON, PVE 8.3.4 (Kernel 6.8.12-8-pve), Microcode 0x0a001119 (BIOS update pending from manufacturer).
VM: Ubuntu 24.04 (Kernel 6.8.0-55), cpu: host, machine: q35, sockets: 2, cores: 64, numa: 1.
Issue 1: Guest only sees 128 threads (SMT OFF).
Issue 2: numactl binding commands have no effect inside the guest.
Tried: cpu: EPYC-Milan, host updates, checked guest cmdline, fixed unrelated guest OS issues.

Question for the Community:

Has anyone else experienced SMT passthrough failing with cpu: host on EPYC 7003 (Milan) CPUs specifically with PVE 8.3.x? Is the failure of numactl binding inside the guest a known related symptom? Are there specific host kernel parameters, QEMU args (args: line in .conf), or other settings known to address this beyond the standard configuration?

leesteken · Mar 29, 2025

vladdc said:
When SMT is enabled on the host (which I've confirmed via lscpu on my Proxmox host showing 2 threads/core), the guest VM should typically also see 2 threads per core with cpu: host. For example, other users with similar EPYC Milan CPUs seem to have SMT working in guests with cpu: host.

My specific problem is that, despite using cpu: host, machine: q35, sockets: 2, cores: 64, and numa: 1 on PVE 8.3.4 with an EPYC 7713 host (where SMT is ON), my Ubuntu 24.04 guest consistently only sees 1 thread per core (128 threads total).

That is indeed normal and the way Proxmox work (and uses QEMU/KVM). Use 2 sockets with 128 cores if you want 256 virtual cores (with 1 thread per core in the VM). Proxmox does not give you a way to specify two (or more virtual) threads per virtual core.

vladdc said:
My understanding, and what seems common practice, is that cpu: host is specifically intended to pass through the host CPU's features as closely as possible, including SMT.

This is not my (long-term) understanding or common practice on this forum or with Proxmox. Using host as CPU type gives you a host-like CPU type not "the host's cores/thread configuration". Feel free to object to this but it's how Proxmox works at the moment (and in the past). You could try to configure what you want with QEMU/KVM command-line parameters in the VM args: configuration file.

EDIT: See -smp in https://www.qemu.org/docs/master/system/invocation.html for more information.

vladdc said:
Regarding adding cores (e.g., setting cores: 128): I worry this would misrepresent the physical topology and break the guest's NUMA awareness, potentially harming performance more than it helps.

Half of the memory comes from one NUMA domain and the other half comes from the other (with two virtual and physical sockets). You can set specific affinity for each virtual core (with 1 thread per core) if you want. See affinity in the manual https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_cpu_resource_limits . Proxmox should let the VM know what memory is "local" to which (virtual) core and what memory is "remote", even though the actual NUMA distance may be different from the real CPU.

vladdc said:
My goal is to understand why the expected SMT passthrough with cpu: host is failing in my specific environment.

Because the host CPU type on Proxmox does not do that. They implemented it differently than you expect, sorry.

vladdc said:
Has anyone else experienced SMT passthrough failing with cpu: host on EPYC 7003 (Milan) CPUs specifically with PVE 8.3.x?

There is no SMT passthrough in Proxmox and there never has been.

EDIT: VMs abstract from the host hardware, if you want your guests to know the CPU and SMT then you might better use containers (that share the Linux kernel and hardware).

MarkusKo · Mar 29, 2025

leesteken said:
There is no SMT passthrough in Proxmox and there never has been.

i think this applies to any hypervisor, dont remember seeing anything like smt / hyperthreading passthrough on esxi either

ajp_anton · Mar 30, 2025

So would we gain or lose performance if we assigned SMT threads to a VM?

Alternative 1:
- Core 0, thread 0
- Core 1, thread 0
The VM sees two cores, and both map to actual physical cores. Performance is what you would expect from two cores.

Alternative 2:
- Core 0, thread 0
- Core 0, thread 1
- Core 1, thread 0
- Core 1, thread 1
The VM sees four cores, but some of them map to threads on the same physical core. The VM is not aware of this, so running two threads might put those on the same physical core, giving you less performance than if you had chosen alternative 1 instead.

So giving the VM *more* hardware might result in less performance in lightly threaded workloads. Fully loading every core should result in a performance gain however.

Am I correct in this train of thought, or have I misunderstood something?

vladdc · Mar 30, 2025

Hi @leesteken (and community),

Thanks for the follow-up explanation regarding SMT and NUMA within Proxmox. I'd like to provide a full picture of my setup and the persistent issues I'm facing, as my observations seem inconsistent and I'm hoping someone might recognize this specific behaviour.

My Goal: Utilize a dual-socket AMD EPYC 7713 server efficiently for a single, large VM running CPU-intensive workloads (Ollama LLMs), leveraging both SMT and NUMA awareness.

Host System:

CPU: 2 x AMD EPYC 7713 (64-Core Processor) - Total 128 cores / 256 threads
RAM: 1 TB DDR4
Server: Gooxi SR201-D12R-NV (Board reported as G1DLRO-B in IPMI, but system is definitely dual socket)
Proxmox VE Version: pve-manager/8.3.4/65224a0f9cd294a3 (running kernel: 6.8.12-8-pve) (Fully updated via apt dist-upgrade recently)
Host lscpu Output Confirms: 2 Sockets, 64 Cores/Socket, 2 Threads/Core, 256 Total CPUs, 2 NUMA Nodes (0-63,128-191 and 64-127,192-255) - Host OS sees SMT correctly.
Host Microcode: 0x0a001119 (dmesg shows SRSO warnings suggesting it's outdated; currently waiting for manufacturer response regarding a BIOS/firmware update).

Guest VM (ID 101) Configuration: (/etc/pve/qemu-server/101.conf)

Code snippet

# Key Settings: bios: ovmf cores: 64 cpu: host machine: q35 memory: 946045 # Approx 924 GiB numa: 1 sockets: 2 # Other settings: virtio-scsi-single, e1000 network (will change to virtio) 

Guest VM OS:

OS: Ubuntu 24.04.2 LTS (Server, fresh install)
Kernel: 6.8.0-55-generic (uname -r)
Guest cmdline: No nosmt or unusual NUMA parameters present.
Guest OS Stability: Previous Python/apt/netplan issues have been resolved.

The Persistent Problems:

1. SMT Passthrough Failure:Despite using the standard recommended VM config (cpu: host, q35, sockets: 2, cores: 64, numa: 1) and the host having SMT ON, the guest VM consistently reports:

lscpu: Thread(s) per core: 1, CPU(s): 128
numactl --hardware: Shows 2 nodes, but only maps CPUs 0-63 (Node 0) and 64-127 (Node 1).
SMT is effectively OFF inside the guest.

2. Guest numactl Binding Failure:Even though the guest sees 2 NUMA nodes (for the 128 threads it has), numactl fails to apply bindings:

Command: sudo numactl --cpunodebind=0 --membind=0 sleep 600 &
Result (numactl -s -p <pid&gt: Always shows cpubind: 0 1 and membind: 0 1. The binding is ignored.

Troubleshooting Already Attempted:

Confirmed SMT is ON on Host OS (lscpu).
Using standard recommended VM config (listed above).
Performed full VM Shutdown/Start after config changes.
Tried cpu: EPYC-Milan -> Same result (128 threads, SMT OFF).
Tried forcing topology via args: -smp 256,sockets=2,... -> Failed VM start due to config conflicts.
Tried args: -cpu host,+smt -> Failed VM start (invalid QEMU property).
Tried adding kvm_amd.vmsmt=1 to host kernel parameters -> No change in guest SMT status (still 128 threads).
Updated Proxmox Host and rebooted -> No change.
Checked Guest kernel parameters -> OK.
Fixed all unrelated Guest OS issues -> OK.

Discussion Points & Questions:

@leesteken stated Proxmox intentionally doesn't pass SMT via cpu: host. Is this the official stance or known behavior, specifically for PVE 8.3 / EPYC Milan? If so, why does cpu: host seem to work for others (e.g., PVE 8.1)? Is this a regression?
Regarding the suggested workaround sockets=2, cores=128: Wouldn't this present an incorrect NUMA topology map (vCPUs 0-127 = Node 0, 128-255 = Node 1) to the guest, potentially harming performance despite showing 256 vCPUs?
Why would numactl binding fail inside the guest when the guest does perceive a 2-node structure via numactl --hardware? Is the NUMA information passed by PVE 8.3 somehow incomplete or incompatible with guest control mechanisms?

Is this combination (PVE 8.3.4, Kernel 6.8.x host/guest, EPYC 7003 Milan) known to have issues with SMT passthrough or guest NUMA control? Are there any other specific args:, kernel parameters, or settings known to work around this?

I am still waiting for a potential BIOS/firmware update from Gooxi, but seeking any insights from the community in the meantime would be extremely helpful.

Thanks!

leesteken · Mar 30, 2025

ajp_anton said:
So giving the VM *more* hardware might result in less performance in lightly threaded workloads. Fully loading every core should result in a performance gain however.

Giving any VM 100% (or most of) a single resource will increase latency but might improve throughput. Proxmox also has some overhead
Latency matters a lot for people who run a few VMs for "gaming" but they also want to give the VM "everything for best performance", which is indeed usually counter productive.

ajp_anton said:
Am I correct in this train of thought, or have I misunderstood something?

I think that you're right. But all this matters less when you run many more VMs than you have physical cores, which is what Proxmox (I believe) is designed for.

ajp_anton · Mar 30, 2025

There's also a possibility that Proxmox is able to shuffle threads around?

Say I assign threads 0-3 to a VM in Proxmox, where 0,1 and 2,3 are the same physical cores, respectively. The VM sees four cores. If it puts a load on cores 0 and 1, are those fixed to the assigned threads 0 and 1, or does proxmox just see "this process has started two heavy threads, let's assign these to threads 0 and 2 because that gives the most performance"?

Proxmox is obviously aware of SMT because it runs on the hardware. So the question is if the VM threads are all individually pinned to their own hardware threads, or can Proxmox assign them to any thread within the pool of threads assigned to the VM?

leesteken · Mar 30, 2025

ajp_anton said:
There's also a possibility that Proxmox is able to shuffle threads around?

Yes but there is also the affinity setting.

ajp_anton said:
Say I assign threads 0-3 to a VM in Proxmox, where 0,1 and 2,3 are the same physical cores, respectively. The VM sees four cores. If it puts a load on cores 0 and 1, are those fixed to the assigned threads 0 and 1, or does proxmox just see "this process has started two heavy threads, let's assign these to threads 0 and 2 because that gives the most performance"?

The virtual VM cores are scheduled by the Linux kernel scheduler. I cannot tell you how that works in any detail, but you investigate the Linux scheduler implementations on your own.

ajp_anton said:
Proxmox is obviously aware of SMT because it runs on the hardware. So the question is if the VM threads are all individually pinned to their own hardware threads, or can Proxmox assign them to any thread within the pool of threads assigned to the VM?

Proxmox leaves this to the Linux kernel and QEMU/KVM: https://www.qemu.org/documentation/

ajp_anton · Mar 30, 2025

leesteken said:
Yes but there is also the affinity setting

But is that just a pool of cores where the VM's threads can live, or are they strictly ordered and pinned to the physical threads?

Another example of this would be to have two VM's and overprovision all the host's cores to both of them. If both VM's put a load on their core 0, will they fight for that same core, or is the host able to allocate these loads on different physical cores?

edit:
Well, I tested this myself. Doing a stress test inside a VM and changing the core affinity, while monitoring the core CPU usage on the host. Sometimes it does put the load on the "correct" physical core, but sometimes it doesn't. So I guess the Proxmox CPU affinity is just a pool of cores given to the VM, but Proxmox can still decide which of those actually gets used.

Example:
VM is given cores 16-23, so in total 8 cores. Inside the VM, start a stress test on cores 0,1,3,6 (picked random numbers). Sometimes the host sees CPU usage spike on cores 16,17,19,22, indicating that there is some bias on putting the load on the corresponding cores, but sometimes the CPU usage is put on any random four cores within the 16-23 range, indicating that it's just a bias and not a rule.

Search

Search

PVE 8.3.4 EPYC 7713 (Milan) SMT Passthrough Fail (cpu: host -> 1 thread/core) & Guest numactl Fail

vladdc

New Member

MarkusKo

Renowned Member

vladdc

New Member

leesteken

Distinguished Member

MarkusKo

Renowned Member

ajp_anton

Member

vladdc

New Member

leesteken

Distinguished Member

ajp_anton

Member

leesteken

Distinguished Member

ajp_anton

Member

We value your privacy