With every VirtualDisk I attach I have to reduce the Cores of my VM

GeneralProbe · Oct 31, 2022

256 x AMD EPYC 7742 64-Core Processor (2 Sockets)
Linux 5.15.64-1-pve #1 SMP PVE 5.15.64-1 (Thu, 13 Oct 2022 10:30:34 +0200)
pve-manager/7.2-11/b76d3178

I am running some evaluation of Proxmox VE on this system.
Installation was uneventful.
I created two storage directories on one Raid1 and one Raid10.

My first test was the creation of a VM with 240 Cores (numa=1) with two virtual hdds. one for base system 100GB and one for swap-drive 64GB.
The system has 800GB of Ram + Q35 + UEFI + Virtio Scsi single + SSD-Emulation + IO thread + Discard + Virtio RNG + virtio networking (Bridged)
When I booted Debian 10 netinstall the system hang on drive detection. When I removed the second drive, I could install debian without any issues.
After the installation I added back the Swap drive but then the system would not boot up anymore with the same error.
After I lowered the Cores to around 192 it started to work again.
Today I added a third drive (7TB) via hotplugging. then I experienced a KernelPanic in the guest:

Code:

[26554.137516] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[26554.139530] rcu:     161-...0: (1 GPs behind) idle=e5e/1/0x4000000000000000 softirq=422/423 fqs=2626
[26554.142360] rcu:     (detected by 95, t=5253 jiffies, g=487989, q=3442)
[26554.144400] Sending NMI from CPU 95 to CPUs 161:
[26557.625399] watchdog: BUG: soft lockup - CPU#57 stuck for 22s! [sshd:22223]
[26557.627616] Modules linked in: fuse btrfs zstd_compress zstd_decompress xxhash ufs qnx4 hfsplus hfs minix msdos jfs xfs dm_mod rfkill snd_hda_intel snd_hda_codec nls_ascii nls_cp437 snd_hda_core kvm_amd vfat snd_hwdep ccp bochs_drm snd_pcm fat kvm ttm snd_timer irqbypass crct10dif_pclmul drm_kms_helper crc32_pclmul snd efi_pstore joydev virtio_rng iTCO_wdt sg ghash_clmulni_intel rng_core virtio_console serio_raw virtio_balloon pcspkr drm efivars iTCO_vendor_support evdev soundcore qemu_fw_cfg button efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod usbhid hid sr_mod virtio_net cdrom sd_mod net_failover crc32c_intel virtio_scsi
[26557.627664]  failover ahci aesni_intel ehci_pci libahci uhci_hcd aes_x86_64 ehci_hcd crypto_simd virtio_pci libata cryptd virtio_ring usbcore lpc_ich glue_helper scsi_mod psmouse i2c_i801 virtio mfd_core usb_common
[26557.627678] CPU: 57 PID: 22223 Comm: sshd Not tainted 4.19.0-22-amd64 #1 Debian 4.19.260-1
[26557.627678] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[26557.627687] RIP: 0010:smp_call_function_many+0x1f8/0x250
[26557.627689] Code: c7 e8 fc 1e 5e 00 3b 05 ba 07 02 01 0f 83 8c fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 07 4f 9d 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 20 c4 72 9d 4c 89 fe 89 df
[26557.627690] RSP: 0018:ffffbfca1956bc18 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[26557.627691] RAX: 000000000000003e RBX: ffff9c296f4681c0 RCX: ffff9c296f5acb60
[26557.627692] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9c296f4681c8
[26557.627692] RBP: ffff9c296f4681c8 R08: 0000000000000200 R09: ffffffffffffffff
[26557.627693] R10: 00000000007fffff R11: 0000000000ffffff R12: ffffffff9c667ff0
[26557.627693] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000200
[26557.627697] FS:  00007fa545957e40(0000) GS:ffff9c296f440000(0000) knlGS:0000000000000000
[26557.627697] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26557.627698] CR2: 00007ffd979aeef2 CR3: 0000006287a02000 CR4: 0000000000340ee0
[26557.627700] Call Trace:
[26557.628216]  ? load_new_mm_cr3+0xc0/0xc0
[26557.628218]  on_each_cpu+0x28/0x60
[26557.628219]  flush_tlb_kernel_range+0x48/0x90
[26557.628222]  __purge_vmap_area_lazy+0x4d/0xc0
[26557.628223]  vm_unmap_aliases+0xe9/0x120
[26557.628225]  change_page_attr_set_clr+0xc7/0x420
[26557.628227]  set_memory_ro+0x26/0x30
[26557.628229]  bpf_prog_select_runtime+0x28/0x110
[26557.628232]  bpf_prepare_filter+0x523/0x590
[26557.628233]  bpf_prog_create_from_user+0xbb/0x110
[26557.628235]  ? hardlockup_detector_perf_cleanup+0x80/0x80
[26557.628236]  do_seccomp+0x25d/0x6c0
[26557.628238]  __x64_sys_prctl+0x4e6/0x590
[26557.628241]  do_syscall_64+0x53/0x110
[26557.628244]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[26557.628245] RIP: 0033:0x7fa545d09c4a
[26557.628247] Code: 48 8b 0d 49 02 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 9d 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 16 02 0c 00 f7 d8 64 89 01 48
[26557.628247] RSP: 002b:00007ffd979ade08 EFLAGS: 00000246 ORIG_RAX: 000000000000009d
[26557.628248] RAX: ffffffffffffffda RBX: 00007ffd979ade10 RCX: 00007fa545d09c4a
[26557.628248] RDX: 000056055a65d040 RSI: 0000000000000002 RDI: 0000000000000016
[26557.628249] RBP: 000056055b1347b0 R08: 0000000000000000 R09: 00007fa545d89e80
[26557.628249] R10: 00007fa545d09c4a R11: 0000000000000246 R12: 00007ffd979adeb0
[26557.628249] R13: 000056055b133b30 R14: 0000000000000000 R15: 0000000000000013
[26564.065302] rcu: rcu_sched kthread starved for 2480 jiffies! g487989 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=17
[26564.068578] rcu: RCU grace-period kthread stack dump:
[26564.070188] rcu_sched       I    0    12      2 0x80000000
[26564.070190] Call Trace:
[26564.070197]  __schedule+0x29f/0x840
[26564.070200]  ? __switch_to_asm+0x35/0x70
[26564.070202]  schedule+0x28/0x80
[26564.070203]  schedule_timeout+0x16b/0x3b0
[26564.070206]  ? __next_timer_interrupt+0xc0/0xc0
[26564.070208]  rcu_gp_kthread+0x40d/0x850
[26564.070210]  ? call_rcu_sched+0x20/0x20
[26564.070212]  kthread+0x112/0x130
[26564.070214]  ? kthread_bind+0x30/0x30
[26564.070215]  ret_from_fork+0x35/0x40
[26585.624632] watchdog: BUG: soft lockup - CPU#57 stuck for 22s! [sshd:22223]
[26585.626860] Modules linked in: fuse btrfs zstd_compress zstd_decompress xxhash ufs qnx4 hfsplus hfs minix msdos jfs xfs dm_mod rfkill snd_hda_intel snd_hda_codec nls_ascii nls_cp437 snd_hda_core kvm_amd vfat snd_hwdep ccp bochs_drm snd_pcm fat kvm ttm snd_timer irqbypass crct10dif_pclmul drm_kms_helper crc32_pclmul snd efi_pstore joydev virtio_rng iTCO_wdt sg ghash_clmulni_intel rng_core virtio_console serio_raw virtio_balloon pcspkr drm efivars iTCO_vendor_support evdev soundcore qemu_fw_cfg button efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod usbhid hid sr_mod virtio_net cdrom sd_mod net_failover crc32c_intel virtio_scsi
[26585.626890]  failover ahci aesni_intel ehci_pci libahci uhci_hcd aes_x86_64 ehci_hcd crypto_simd virtio_pci libata cryptd virtio_ring usbcore lpc_ich glue_helper scsi_mod psmouse i2c_i801 virtio mfd_core usb_common
[26585.626898] CPU: 57 PID: 22223 Comm: sshd Tainted: G             L    4.19.0-22-amd64 #1 Debian 4.19.260-1
[26585.626898] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[26585.626906] RIP: 0010:smp_call_function_many+0x1f8/0x250
[26585.626908] Code: c7 e8 fc 1e 5e 00 3b 05 ba 07 02 01 0f 83 8c fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 07 4f 9d 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 20 c4 72 9d 4c 89 fe 89 df
[26585.626909] RSP: 0018:ffffbfca1956bc18 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[26585.626910] RAX: 000000000000003e RBX: ffff9c296f4681c0 RCX: ffff9c296f5acb60
[26585.626910] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9c296f4681c8
[26585.626911] RBP: ffff9c296f4681c8 R08: 0000000000000200 R09: ffffffffffffffff
[26585.626911] R10: 00000000007fffff R11: 0000000000ffffff R12: ffffffff9c667ff0
[26585.626912] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000200
[26585.626914] FS:  00007fa545957e40(0000) GS:ffff9c296f440000(0000) knlGS:0000000000000000
[26585.626915] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26585.626915] CR2: 00007ffd979aeef2 CR3: 0000006287a02000 CR4: 0000000000340ee0
[26585.626917] Call Trace:
[26585.626923]  ? load_new_mm_cr3+0xc0/0xc0
[26585.626924]  on_each_cpu+0x28/0x60
[26585.626926]  flush_tlb_kernel_range+0x48/0x90
[26585.626928]  __purge_vmap_area_lazy+0x4d/0xc0
[26585.626930]  vm_unmap_aliases+0xe9/0x120
[26585.626931]  change_page_attr_set_clr+0xc7/0x420
[26585.626933]  set_memory_ro+0x26/0x30
[26585.626937]  bpf_prog_select_runtime+0x28/0x110
[26585.626939]  bpf_prepare_filter+0x523/0x590
[26585.626940]  bpf_prog_create_from_user+0xbb/0x110
[26585.626943]  ? hardlockup_detector_perf_cleanup+0x80/0x80
[26585.626944]  do_seccomp+0x25d/0x6c0
[26585.626946]  __x64_sys_prctl+0x4e6/0x590
[26585.626949]  do_syscall_64+0x53/0x110
[26585.626952]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[26585.626953] RIP: 0033:0x7fa545d09c4a
[26585.626954] Code: 48 8b 0d 49 02 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 9d 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 16 02 0c 00 f7 d8 64 89 01 48
[26585.626954] RSP: 002b:00007ffd979ade08 EFLAGS: 00000246 ORIG_RAX: 000000000000009d
[26585.626955] RAX: ffffffffffffffda RBX: 00007ffd979ade10 RCX: 00007fa545d09c4a
[26585.626956] RDX: 000056055a65d040 RSI: 0000000000000002 RDI: 0000000000000016
[26585.626956] RBP: 000056055b1347b0 R08: 0000000000000000 R09: 00007fa545d89e80
[26585.626957] R10: 00007fa545d09c4a R11: 0000000000000246 R12: 00007ffd979adeb0
[26585.626957] R13: 000056055b133b30 R14: 0000000000000000 R15: 0000000000000013
root@www:~#
Message from syslogd@www at Oct 30 22:02:37 ...
 kernel:[26613.623866] watchdog: BUG: soft lockup - CPU#57 stuck for 22s! [sshd:22223]

It took ages to shut the system down, afterwards the system hung at drive detection.
So I lowered the Core Count again to 136 (I did not check if that was the maximum possible to work correctly), then the system was able to boot again.

I found this kernel.org bug report: https://bugzilla.kernel.org/show_bug.cgi?id=199727
But switching to "threads" did not fix my issue.
Maybe someone has an idea how to fix this strange issue ...

dietmar · Nov 1, 2022

Your host has 64 cores, so the maximum number of cores for a VM should be 64.

Why do you want to assign 240 cores (makes no sense to me)?

apoc · Nov 1, 2022

dietmar said:
Why do you want to assign 240 cores (makes no sense to me)?

Clearly a misunderstanding how virtualization and SMT works.

64 Cores per CPU and 128 Threads
https://www.amd.com/de/products/cpu/amd-epyc-7742
Without numa it should be less than 64C (Threads are no cores).
With numa I'd not assign more than 120C - but even that is questionable depending on the workload.
My 2 Cents

GeneralProbe · Nov 1, 2022

Right, I mixed up cores and threads in my description, but the two EPYC CPUs do offer 256 Threads, am I not allowed to utilize the majority of them in one VM?

apoc · Nov 1, 2022

I wouldn't do it. Threads are no cores.
And in the end the hypervisor needs cycles as well.
Ethernet, disk io. All that needs CPU.
In the end you are better / faster with less cores most of the time, especially when you have multiple VMs.
Scheduling overhead is not zero.
If you want all CPUs /threads in one machine install without hypervisor. Go bare metal.

Neobin · Nov 1, 2022

How the (high) number of threads/vcores assigned to a VM correlate to the number of vdisks, leading to misbehavior inside the VM (even without any noteworthy load), would be interesting to know anyway; at least for my curiosity.

rsc · Nov 1, 2022

dietmar said:
Your host has 64 cores, so the maximum number of cores for a VM should be 64.

Why do you want to assign 240 cores (makes no sense to me)?

From my point of understanding, Proxmox only allows to assign virtual CPU sockets and virtual CPU cores to a virtual machine, however no virtual CPU threads, like e.g. libvirt allows. Given virtualization is practically often used for higher density via overcommitting resources, ending up with 240 vCPUs (vCPUs, explicitly not differentiating between cores vs. threads here) for physical 2x 64 cores with 128 threads each, doesn't seem that unreasonable to me, especially as with actual low computing usage, such as running a Debian installer, no actual physical resource limits should be reached (even the Debian installer might be heavily optimized for multi-core usage, which I doubt).

Yes, assigning 240 vCPUs to a single virtual machine might be an extreme example, but running like 100-120 virtual machines with 2 vCPUs each (vCPUs, explicitly not differentiating between cores vs. threads here) on such hardware is IMHO not that uncommon for inexpensive virtual machine hosting providers. And this should in the end be quite similar like the extreme case here, right? Or is the general Proxmox recommendation to do not any overcommit of resources at all? Further on, is there any chance for virtual CPU thread support, like e.g. in libvirt? This would allow like 100-120 virtual machines with each 1 virtual CPU socket, 1 virtual CPU core but 2 virtual CPU threads (which would be closer to the physical hardware, often also called 2 vCPUs in common speech).

apoc · Nov 1, 2022

rsc said:
And this should in the end be quite similar like the extreme case here, right?

No. This is a completely different situation/beast.

Make yourself familiar with how hypervisors work internally and how they do the scheduling.
Also make sure you understand the difference between threads and real cores. This differs from CPU architecture to architecture as well. So do a proper research.

Search

Search

With every VirtualDisk I attach I have to reduce the Cores of my VM

GeneralProbe

Member

dietmar

Proxmox Staff Member

apoc

Famous Member

GeneralProbe

Member

apoc

Famous Member

Neobin

Distinguished Member

rsc

New Member

apoc

Famous Member

We value your privacy