VM hangs up occasionally with Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.

marvinmz1

New Member
Oct 21, 2025
3
0
1
I've been noticing these issues popping up in a couple of my VMs, while others appear to be funcioning fine.

I for the life of me cannot figure it out, so I'd appreciate if someone could point me in the right direction.

All I could find from searching online about this is that it's supposed to be a kernel issue, but all these threads are at least a year old so I'd assume things should have been patched by now?

Code:
Oct 21 14:57:13 music-assistant kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Oct 21 14:57:13 music-assistant kernel: rcu:         0-...!: (137 ticks this GP) idle=9c34/1/0x4000000000000000 softirq=1421109/1421111 fqs=1200
Oct 21 14:57:13 music-assistant kernel: rcu:         (detected by 1, t=71394 jiffies, g=2877637, q=681 ncpus=2)
Oct 21 14:57:13 music-assistant kernel: Sending NMI from CPU 1 to CPUs 0:
Oct 21 14:57:13 music-assistant kernel: NMI backtrace for cpu 0
Oct 21 14:57:13 music-assistant kernel: CPU: 0 PID: 9636 Comm: fwupd Not tainted 6.8.0-85-generic #85-Ubuntu
Oct 21 14:57:13 music-assistant kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Oct 21 14:57:13 music-assistant kernel: RIP: 0010:asm_sysvec_reschedule_ipi+0x0/0x20
Oct 21 14:57:13 music-assistant kernel: Code: e8 65 75 e4 ff e9 b0 07 00 00 0f 01 ca fc 6a ff e8 55 06 00 00 48 89 c4 48 8d 6c 24 01 48 89 e7 e8 45 44 e4 ff e9 90 07 00 00>
Oct 21 14:57:13 music-assistant kernel: RSP: 0000:ffffbe3602bd7b38 EFLAGS: 00000046
Oct 21 14:57:13 music-assistant kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000200
Oct 21 14:57:13 music-assistant kernel: RDX: ffffdf13c20c0040 RSI: ffffdf13c20c0000 RDI: ffff9fdec3000000
Oct 21 14:57:13 music-assistant kernel: RBP: ffffbe3602bd7b90 R08: 0000000000000000 R09: 0000000000000000
Oct 21 14:57:13 music-assistant kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000040
Oct 21 14:57:13 music-assistant kernel: R13: 0000000000000000 R14: ffffdf13c20c0000 R15: 0000000000000001
Oct 21 14:57:13 music-assistant kernel: FS:  000074767adb0b80(0000) GS:ffff9fdf7bc00000(0000) knlGS:0000000000000000
Oct 21 14:57:13 music-assistant kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 21 14:57:13 music-assistant kernel: CR2: 000062a604d2b028 CR3: 000000007467e000 CR4: 0000000000350ef0
Oct 21 14:57:13 music-assistant kernel: Call Trace:
Oct 21 14:57:13 music-assistant kernel:  <NMI>
Oct 21 14:57:13 music-assistant kernel:  ? show_regs+0x6d/0x80
Oct 21 14:57:13 music-assistant kernel:  ? nmi_cpu_backtrace+0xb5/0x120
Oct 21 14:57:13 music-assistant kernel:  ? nmi_cpu_backtrace_handler+0x11/0x20
Oct 21 14:57:13 music-assistant kernel:  ? nmi_handle+0x67/0x180
Oct 21 14:57:13 music-assistant kernel:  ? default_do_nmi+0x47/0x140
Oct 21 14:57:13 music-assistant kernel:  ? exc_nmi+0x1c2/0x290
Oct 21 14:57:13 music-assistant kernel:  ? end_repeat_nmi+0xf/0x60
Oct 21 14:57:13 music-assistant kernel:  ? asm_sysvec_x86_platform_ipi+0x20/0x20
Oct 21 14:57:13 music-assistant kernel:  ? asm_sysvec_x86_platform_ipi+0x20/0x20
Oct 21 14:57:13 music-assistant kernel:  ? asm_sysvec_x86_platform_ipi+0x20/0x20
Oct 21 14:57:13 music-assistant kernel:  </NMI>
Oct 21 14:57:13 music-assistant kernel:  <TASK>
Oct 21 14:57:13 music-assistant kernel:  ? clear_page_rep+0x7/0x10
Oct 21 14:57:13 music-assistant kernel:  ? post_alloc_hook+0xcd/0x120
Oct 21 14:57:13 music-assistant kernel:  get_page_from_freelist+0x1d1/0x610
Oct 21 14:57:13 music-assistant kernel:  __alloc_pages+0x1e9/0x350
Oct 21 14:57:13 music-assistant kernel:  alloc_pages_mpol+0x91/0x210
Oct 21 14:57:13 music-assistant kernel:  vma_alloc_folio+0x64/0xd0
Oct 21 14:57:13 music-assistant kernel:  alloc_anon_folio+0x1cc/0x340
Oct 21 14:57:13 music-assistant kernel:  ? folio_add_lru+0x71/0xf0
Oct 21 14:57:13 music-assistant kernel:  do_anonymous_page+0x6c/0x430
Oct 21 14:57:13 music-assistant kernel:  handle_pte_fault+0x1cb/0x1d0
Oct 21 14:57:13 music-assistant kernel:  __handle_mm_fault+0x654/0x800
Oct 21 14:57:13 music-assistant kernel:  handle_mm_fault+0x18a/0x380
Oct 21 14:57:13 music-assistant kernel:  do_user_addr_fault+0x169/0x670
Oct 21 14:57:13 music-assistant kernel:  exc_page_fault+0x83/0x1b0
Oct 21 14:57:13 music-assistant kernel:  asm_exc_page_fault+0x27/0x30
Oct 21 14:57:13 music-assistant kernel: RIP: 0033:0x74767c8ac51a
Oct 21 14:57:13 music-assistant kernel: Code: 8d 15 ca 75 15 00 49 39 d4 49 89 4c 24 60 0f 95 c2 48 29 d8 0f b6 d2 48 83 c8 01 48 c1 e2 02 48 09 da 48 83 ca 01 49 89 56 08>
Oct 21 14:57:13 music-assistant kernel: RSP: 002b:00007ffd7f06c090 EFLAGS: 00010202
Oct 21 14:57:13 music-assistant kernel: RAX: 000000000000efe1 RBX: 0000000000000030 RCX: 000062a604d2b020
Oct 21 14:57:13 music-assistant kernel: RDX: 0000000000000031 RSI: fffffffffffffef0 RDI: 0000000000000000
Oct 21 14:57:13 music-assistant kernel: RBP: 00007ffd7f06c110 R08: 000074767ca03b20 R09: 0000000000000030
Oct 21 14:57:13 music-assistant kernel: R10: 0000000000000001 R11: 0000000000000000 R12: 000074767ca03ac0
Oct 21 14:57:13 music-assistant kernel: R13: 0000000000000020 R14: 000062a604d2aff0 R15: 0000000000000030
Oct 21 14:57:13 music-assistant kernel:  </TASK>
Oct 21 14:57:13 music-assistant kernel: rcu: rcu_preempt kthread timer wakeup didn't happen for 65393 jiffies! g2877637 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
Oct 21 14:57:13 music-assistant kernel: rcu:         Possible timer handling issue on cpu=1 timer-softirq=1301070
Oct 21 14:57:13 music-assistant kernel: rcu: rcu_preempt kthread starved for 65396 jiffies! g2877637 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
Oct 21 14:57:13 music-assistant kernel: rcu:         Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
Oct 21 14:57:13 music-assistant kernel: rcu: RCU grace-period kthread stack dump:
Oct 21 14:57:13 music-assistant kernel: task:rcu_preempt     state:I stack:0     pid:17    tgid:17    ppid:2      flags:0x00004000
Oct 21 14:57:13 music-assistant kernel: Call Trace:
Oct 21 14:57:13 music-assistant kernel:  <TASK>
Oct 21 14:57:13 music-assistant kernel:  __schedule+0x27c/0x6b0
Oct 21 14:57:13 music-assistant kernel:  ? srso_return_thunk+0x5/0x5f
Oct 21 14:57:13 music-assistant kernel:  ? __pfx_rcu_gp_kthread+0x10/0x10
Oct 21 14:57:13 music-assistant kernel:  schedule+0x33/0x110
Oct 21 14:57:13 music-assistant kernel:  schedule_timeout+0x95/0x170
Oct 21 14:57:13 music-assistant kernel:  ? __pfx_process_timeout+0x10/0x10
Oct 21 14:57:13 music-assistant kernel:  rcu_gp_fqs_loop+0x105/0x580
Oct 21 14:57:13 music-assistant kernel:  ? _raw_spin_unlock_irq+0xe/0x50
Oct 21 14:57:13 music-assistant kernel:  rcu_gp_kthread+0xee/0x180
Oct 21 14:57:13 music-assistant kernel:  kthread+0xf2/0x120
Oct 21 14:57:13 music-assistant kernel:  ? __pfx_kthread+0x10/0x10
Oct 21 14:57:13 music-assistant kernel:  ret_from_fork+0x47/0x70
Oct 21 14:57:13 music-assistant kernel:  ? __pfx_kthread+0x10/0x10
Oct 21 14:57:13 music-assistant kernel:  ret_from_fork_asm+0x1b/0x30
Oct 21 14:57:13 music-assistant kernel:  </TASK>
Oct 21 14:57:13 music-assistant kernel: rcu: Stack dump where RCU GP kthread last ran:
Oct 21 14:57:13 music-assistant kernel: CPU: 1 PID: 9621 Comm: dmx0:nut Not tainted 6.8.0-85-generic #85-Ubuntu
Oct 21 14:57:13 music-assistant kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Oct 21 14:57:13 music-assistant kernel: RIP: 0010:clear_page_rep+0x7/0x10
Oct 21 14:57:13 music-assistant kernel: Code: 5b 41 5c 5d 31 ff e9 18 14 09 00 cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 b9 00 02 00 00 31 c0>
Oct 21 14:57:13 music-assistant kernel: RSP: 0000:ffffbe3602b97b60 EFLAGS: 00000246
Oct 21 14:57:13 music-assistant kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000200
Oct 21 14:57:13 music-assistant kernel: RDX: ffffdf13c20caa00 RSI: ffffdf13c20ca9c0 RDI: ffff9fdec32a7000
Oct 21 14:57:13 music-assistant kernel: RBP: ffffbe3602b97b90 R08: 0000000000000000 R09: 0000000000000000
Oct 21 14:57:13 music-assistant kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000040
Oct 21 14:57:13 music-assistant kernel: R13: 0000000000000000 R14: ffffdf13c20ca9c0 R15: 0000000000000001
Oct 21 14:57:13 music-assistant kernel: FS:  000078fcb66b7b30(0000) GS:ffff9fdf7bd00000(0000) knlGS:0000000000000000
Oct 21 14:57:13 music-assistant kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 21 14:57:13 music-assistant kernel: CR2: 000078fcb5d64000 CR3: 0000000067494000 CR4: 0000000000350ef0
Oct 21 14:57:13 music-assistant kernel: Call Trace:
Oct 21 14:57:13 music-assistant kernel:  <IRQ>
Oct 21 14:57:13 music-assistant kernel:  ? show_regs+0x6d/0x80
Oct 21 14:57:13 music-assistant kernel:  ? dump_cpu_task+0x77/0x90
Oct 21 14:57:13 music-assistant kernel:  ? rcu_check_gp_kthread_starvation+0x1ce/0x280
Oct 21 14:57:13 music-assistant kernel:  ? srso_return_thunk+0x5/0x5f
Oct 21 14:57:13 music-assistant kernel:  ? rcu_check_gp_kthread_expired_fqs_timer+0xf8/0x110
Oct 21 14:57:13 music-assistant kernel:  ? print_other_cpu_stall+0x2a3/0x580
Oct 21 14:57:13 music-assistant kernel:  ? check_cpu_stall+0x1ca/0x230
Oct 21 14:57:13 music-assistant kernel:  ? rcu_pending+0x32/0x1f0
Oct 21 14:57:13 music-assistant kernel:  ? rcu_sched_clock_irq+0xd5/0x3c0
Oct 21 14:57:13 music-assistant kernel:  ? srso_return_thunk+0x5/0x5f
Oct 21 14:57:13 music-assistant kernel:  ? update_process_times+0x76/0xb0
Oct 21 14:57:13 music-assistant kernel:  ? tick_sched_handle+0x28/0x70
Oct 21 14:57:13 music-assistant kernel:  ? tick_nohz_highres_handler+0x78/0xa0
Oct 21 14:57:13 music-assistant kernel:  ? __pfx_tick_nohz_highres_handler+0x10/0x10
Oct 21 14:57:13 music-assistant kernel:  ? __hrtimer_run_queues+0x112/0x2a0
Oct 21 14:57:13 music-assistant kernel:  ? kvm_clock_get_cycles+0x18/0x40
Oct 21 14:57:13 music-assistant kernel:  ? hrtimer_interrupt+0xf6/0x250
Oct 21 14:57:13 music-assistant kernel:  ? __sysvec_apic_timer_interrupt+0x51/0x120
Oct 21 14:57:13 music-assistant kernel:  ? sysvec_apic_timer_interrupt+0x8d/0xd0
Oct 21 14:57:13 music-assistant kernel:  </IRQ>
Oct 21 14:57:13 music-assistant kernel:  <TASK>
Oct 21 14:57:13 music-assistant kernel:  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
Oct 21 14:57:13 music-assistant kernel:  ? clear_page_rep+0x7/0x10
Oct 21 14:57:13 music-assistant kernel:  ? post_alloc_hook+0xcd/0x120
Oct 21 14:57:13 music-assistant kernel:  get_page_from_freelist+0x1d1/0x610
Oct 21 14:57:13 music-assistant kernel:  ? srso_return_thunk+0x5/0x5f
Oct 21 14:57:13 music-assistant kernel:  ? switch_fpu_return+0x55/0xf0
Oct 21 14:57:13 music-assistant kernel:  __alloc_pages+0x1e9/0x350
Oct 21 14:57:13 music-assistant kernel:  alloc_pages_mpol+0x91/0x210
Oct 21 14:57:13 music-assistant kernel:  ? srso_return_thunk+0x5/0x5f
Oct 21 14:57:13 music-assistant kernel:  ? __pfx_futex_wake_mark+0x10/0x10
Oct 21 14:57:13 music-assistant kernel:  vma_alloc_folio+0x64/0xd0
Oct 21 14:57:13 music-assistant kernel:  alloc_anon_folio+0x1cc/0x340
Oct 21 14:57:13 music-assistant kernel:  do_anonymous_page+0x6c/0x430
Oct 21 14:57:13 music-assistant kernel:  handle_pte_fault+0x1cb/0x1d0
Oct 21 14:57:13 music-assistant kernel:  __handle_mm_fault+0x654/0x800
Oct 21 14:57:13 music-assistant kernel:  handle_mm_fault+0x18a/0x380
Oct 21 14:57:13 music-assistant kernel:  do_user_addr_fault+0x169/0x670
Oct 21 14:57:13 music-assistant kernel:  exc_page_fault+0x83/0x1b0
Oct 21 14:57:13 music-assistant kernel:  asm_exc_page_fault+0x27/0x30
Oct 21 14:57:13 music-assistant kernel: RIP: 0033:0x78fcba401d6d
Oct 21 14:57:13 music-assistant kernel: Code: 48 39 ca 75 f0 c3 48 89 f8 48 83 fa 08 72 14 f7 c7 07 00 00 00 74 0c a4 48 ff ca f7 c7 07 00 00 00 75 f4 48 89 d1 48 c1 e9 03>
Oct 21 14:57:13 music-assistant kernel: RSP: 002b:000078fcb66b7138 EFLAGS: 00010216
Oct 21 14:57:13 music-assistant kernel: RAX: 000078fcb5d639e0 RBX: 000078fcb7337b40 RCX: 000000000000034b
Oct 21 14:57:13 music-assistant kernel: RDX: 000000000000207a RSI: 000078fcb730c3cd RDI: 000078fcb5d64000
Oct 21 14:57:13 music-assistant kernel: RBP: 000078fcb5d639e0 R08: 0000000000000000 R09: 0000000000000000
Oct 21 14:57:13 music-assistant kernel: R10: 0000000000000092 R11: 628138208382b3c2 R12: 000000000000207a
Oct 21 14:57:13 music-assistant kernel: R13: 000000000000207a R14: 000000000000207a R15: 0000000000000000
Oct 21 14:57:13 music-assistant kernel:  </TASK>
Oct 21 14:57:13 music-assistant fwupdmgr[9631]: Successfully downloaded new metadata: 0 local devices supported
Oct 21 14:57:13 music-assistant systemd[1]: fwupd-refresh.service: Deactivated successfully.
Oct 21 14:57:13 music-assistant systemd[1]: Finished fwupd-refresh.service - Refresh fwupd metadata and update motd.

pveversion -v
Code:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-15-pve)
pve-manager: 8.4.14 (running version: 8.4.14/b502d23c55afcba1)
proxmox-kernel-helper: 8.1.4
proxmox-kernel-6.8: 6.8.12-15
proxmox-kernel-6.8.12-15-pve-signed: 6.8.12-15
proxmox-kernel-6.8.12-14-pve-signed: 6.8.12-14
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8.12-11-pve-signed: 6.8.12-11
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph-fuse: 18.2.7-pve1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.2
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.2
libpve-cluster-perl: 8.1.2
libpve-common-perl: 8.3.4
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.7
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.7-1
proxmox-backup-file-restore: 3.4.7-1
proxmox-backup-restore-image: 0.7.0
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.4
proxmox-mail-forward: 0.3.3
proxmox-mini-journalreader: 1.5
proxmox-offline-mirror-helper: 0.6.8
proxmox-widget-toolkit: 4.3.13
pve-cluster: 8.1.2
pve-container: 5.3.3
pve-docs: 8.4.1
pve-edk2-firmware: 4.2025.02-4~bpo12+1
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.2
pve-firmware: 3.16-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.5
pve-qemu-kvm: 9.2.0-7
pve-xtermjs: 5.5.0-2
qemu-server: 8.4.4
smartmontools: 7.3-pve1
spiceterm: 3.3.1
swtpm: 0.8.0+pve1
vncterm: 1.8.1
zfsutils-linux: 2.2.8-pve1

qm config
Code:
boot: order=scsi0;net0
cores: 2
cpu: EPYC-Rome-v4
memory: 4096
meta: creation-qemu=9.2.0,ctime=1760605621
name: music-assistant
net0: virtio=BC:24:11:FC:1E:AD,bridge=vmbr0,firewall=1,tag=11
numa: 0
onboot: 1
ostype: l26
scsi0: nvme-data:vm-104-disk-0,iothread=1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=24ac9650-f4d3-449a-b7ab-fe0d213c634e
sockets: 1
vmgenid: 40c4cdc6-3031-4d96-bfa5-321d0c0a68c8

edit: upon trying to run apt upgrade on the VM I get a bunch of watchdog: BUG: soft lockup - CPU # errors on both CPU #1, and CPU #2, and the whole VM hangs completely.
 
Last edited:
Does the issue persist, when changing the vm configuration to use 4 cores or another cpu type, like x86-64-v2-AES or host?
Btw, on which hardware is the PVE running on?
 
Does the issue persist, when changing the vm configuration to use 4 cores or another cpu type, like x86-64-v2-AES or host?
Btw, on which hardware is the PVE running on?

I've had it on x86-64-v2-AEU first before switching to the current profile - only because the only VM that works reliably is currently using this same config.

As for the system, it's running on supermicro's H13SSW, amd epyc 9135, 160 GB ram, 8x kioxia CD8P-R. One of the VMs (the one that works) has an intel arc a310 passed through.
 
The system hangs at zeroing of the memory pages and becaue you mentioned that happens on multiple VM not just a single one, I would therefore
  • check memory/disk for failures
  • check on hypervisor for resource consumption/pressure/stalling
  • have a look if microcode update for cpu is available (manual)
  • try an older/newer kernel
 
The system hangs at zeroing of the memory pages and becaue you mentioned that happens on multiple VM not just a single one, I would therefore
  • check memory/disk for failures
  • check on hypervisor for resource consumption/pressure/stalling
  • have a look if microcode update for cpu is available (manual)
  • try an older/newer kernel
i already checked memory and disks and they all look good so i'll try with the microcode. I'll update. Thanks!