Ubuntu VM Offline Cores

ITxD

Member
Jul 19, 2023
9
0
6
Hi,

we have just installed Proxmox on HPE proliant server with 2cpus(128*2=256|512 vCores). we've created an Ubuntu 24.04 VM and allocated 254Cores(508 vCores) to it.

We've noticed that the vm can see all the cores but only half of the cores are online, see the below output for more information:

Is there anything we're missing here? how can the rest of the cores be enabled?

Thanks

1724363474189.png

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 508
On-line CPU(s) list: 0-253
Off-line CPU(s) list: 254-507
Vendor ID: AuthenticAMD
Model name: AMD EPYC 9754 128-Core Processor
CPU family: 25
Model: 160
Thread(s) per core: 1
Core(s) per socket: 254
Socket(s): 1
Stepping: 2
BogoMIPS: 4493.24
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16
pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp ibrs_enhanced vm
mcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr wbnoinv
d arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpi
d fsrm flush_l1d arch_capabilities
Virtualization features:
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 15.9 MiB (254 instances)
L1i: 15.9 MiB (254 instances)
L2: 127 MiB (254 instances)
L3: 4 GiB (254 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-253
NUMA node1 CPU(s):
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Vulnerable: Safe RET, no microcode
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected
(base) vmadmin@pdva-prod-vm3:~$
 
also noticed it's only showing 1 Socket connected only. i tried to lower the core count to 128(256) + 2 sockets. and now i can see that all the cores are online and lscpu is also showing 2 Sockets instead of 1.

---------------------------------------------------------------------------------
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Vendor ID: AuthenticAMD
Model name: AMD EPYC 9754 128-Core Processor
CPU family: 25
Model: 160
Thread(s) per core: 1
Core(s) per socket: 128
Socket(s): 2
Stepping: 2
BogoMIPS: 4493.24
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16
pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp ibrs_enhanced vm
mcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr wbnoinv
d arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpi
d fsrm flush_l1d arch_capabilities
Virtualization features:
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 16 MiB (256 instances)
L1i: 16 MiB (256 instances)
L2: 128 MiB (256 instances)
L3: 4 GiB (256 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-127
NUMA node1 CPU(s): 128-255
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Vulnerable: Safe RET, no microcode
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected
 
I don't get it, if you're dedicating the whole CPU landscape to (1) VM, why not run bare metal? What do you need proxmox for in this case?
 
If you start the VM from CLI (qm start VMID), do you get any message or warning? Is there any logs in the system related to this?

I remember a bug report in Ubuntu where the max cpu for a VM was 288 [1]. Seems to be fixed there, but I don't know if it is fixed upstream. I've been unable to find a bugreport on QEMU. Looking at the sources it seems that PVE uses the same 288 cpu max [2] (I might not be looking at the right place, I'm not a devel). Filled a bug report asking about this [3].

@Kingneutron Sometimes you may need to use "the whole machine" for the VM and still be able to take advantage of virtualization: live migrations, backups, life cycle management... There may be other approaches like using more instances of the app in smaller VMs, but sometimes is just too hard to get it right. @ITxD I'm curious too about this exact use case!

[1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2012763
[2] https://git.proxmox.com/?p=mirror_q...c18d7de54c112281f74c70c2a699e150;hb=HEAD#l424
[3] https://bugzilla.proxmox.com/show_bug.cgi?id=5671
 
Hi,
do you have x2APIC enabled in your BIOS and not turned off via kernel commandline? If remembering correctly, this is required for AMD CPUs with > 255 threads when a guest is assigned more vCPUs (thanks @mira for the info).
 
  • Like
Reactions: VictorSTS
I have a similar issue with the server, which has dual AMD EPYC 9654 CPUs. I can only see 256 cores successfully brought up by Ubuntu 24.04.2 VM (I also tried 22.04 with 5.15, 6.5 and 6.8 kernels). There is a limit of 256 vCPUs per socket, so If I split vCPUs in 2-socket each 128 cores - it boots fine, but if total number of vCPUs is greater than 256, then all vCPUs on the second node are offline. Dmesg shows "CPU has invalid APIC ID" in the VM for every vCPU on node 1 (node 0 is fine, though). I checked that in this case, all vCPUs on node 1 have 0x0100, 0x0101, etc. numbering. And that seems incorrect for the VM:
Code:
[    2.360444] smpboot: CPU 192 has invalid APIC ID 100. Aborting bringup
[    2.360643] smpboot: CPU 193 has invalid APIC ID 101. Aborting bringup
[    2.360920] smpboot: CPU 194 has invalid APIC ID 102. Aborting bringup
...
on the kernel 5.15 it actually shows raw APIC index:
Code:
[ 0.005258] SRAT: PXM 1 -> APIC 0x0100 -> Node 1

Proxmox server has x2apic enabled, as in dmesg on the hypervisor host, I see:
Code:
[    3.994780] AMD-Vi: X2APIC enabled
[    3.996948] AMD-Vi: Virtual APIC enabled
proxmox itself can work with all 384 cores fine.

target VM should manage to work with lots of CPUs, I tried the latest hwe kernel for 22.04 and 24.04 Ubuntu server:
Code:
# grep CONFIG_NR_CPUS /boot/config-$(uname -r)
CONFIG_NR_CPUS_RANGE_BEGIN=8192
CONFIG_NR_CPUS_RANGE_END=8192
CONFIG_NR_CPUS_DEFAULT=8192
CONFIG_NR_CPUS=8192

Target VM shows it has x2apic support:
Code:
# dmesg | grep x2apic
[    0.000202] x2apic: enabled by BIOS, switching to x2apic ops
[    0.005437] APIC: Switched APIC routing to: cluster x2apic
[    8.096206] APIC: Switched APIC routing to: physical x2apic

Would be very thankful for any advice, as not utilizing all cores is so painful.
 
I encountered the same issue. A VM with 2 sockets and 180 cores assigned in Proxmox has 1 socket with half of the CPUs disabled. Using dual EPYCs, x2apic is enabled. Does anyone have any update?
 
I managed to solve this by adding this to the .config file of the VM:

args: -cpu host -machine kernel-irqchip=split