I continue to get this hard lockup on CPU 17, what should I do to debug this issue?
pveversion -v
lscpu
Code:
kernel:[74994.546705] NMI watchdog: Watchdog detected hard LOCKUP on cpu 17
Mar 11 21:23:01 proxmox kernel: [74994.546705] NMI watchdog: Watchdog detected hard LOCKUP on cpu 17
Mar 11 21:23:01 proxmox kernel: [74994.546708] Modules linked in: veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common zfs(PO) zunicode(PO) zzstd(O) iwlmvm zlua(O) zavl(PO) mac80211 icp(PO) edac_mce_amd libarc4 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi kvm_amd snd_hda_intel zcommon(PO) znvpair(PO) kvm btusb snd_intel_dspcfg spl(O) snd_intel_sdw_acpi btrtl btbcm crct10dif_pclmul vhost_net iwlwifi ghash_clmulni_intel snd_hda_codec vhost btintel vhost_iotlb aesni_intel tap joydev snd_hda_core bluetooth crypto_simd ib_iser input_leds cfg80211 snd_hwdep cryptd ecdh_generic rdma_cm snd_pcm ecc iw_cm rapl snd_timer snd mxm_wmi efi_pstore soundcore wmi_bmof pcspkr ccp ib_cm k10temp ib_core mac_hid iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid vfio_pci vfio_pci_core vfio_virqfd irqbypass vfio_iommu_type1 vfio drm sunrpc
Mar 11 21:23:01 proxmox kernel: [74994.546745] ip_tables x_tables autofs4 hid_logitech_hidpp btrfs blake2b_generic xor zstd_compress hid_logitech_dj hid_generic usbkbd usbmouse usbhid hid raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xhci_pci ahci xhci_pci_renesas crc32_pclmul i2c_piix4 igb libahci xhci_hcd i2c_algo_bit dca nvme nvme_core wmi
Mar 11 21:23:01 proxmox kernel: [74994.546758] CPU: 17 PID: 0 Comm: swapper/17 Tainted: P O 5.15.85-1-pve #1
Mar 11 21:23:01 proxmox kernel: [74994.546760] Hardware name: To Be Filled By O.E.M. X570 Taichi/X570 Taichi, BIOS P5.01 01/18/2023
Mar 11 21:23:01 proxmox kernel: [74994.546761] RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x240
Mar 11 21:23:01 proxmox kernel: [74994.546766] Code: 2b 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 03 30 e4 09 d0 a9 00 01 ff ff 0f 85 13 01 00 00 85 c0 74 0e 8b 03 84 c0 74 08 f3 90 <8b> 03 84 c0 75 f8 b8 01 00 00 00 66 89 03 5b 41 5c 41 5d 41 5e 41
Mar 11 21:23:01 proxmox kernel: [74994.546767] RSP: 0018:ffffb346805d0e98 EFLAGS: 00000082
Mar 11 21:23:01 proxmox kernel: [74994.546768] RAX: 0000000000000180 RBX: ffff8fb01ee61a40 RCX: 0000000000000020
Mar 11 21:23:01 proxmox kernel: [74994.546769] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8fb01ee61a40
Mar 11 21:23:01 proxmox kernel: [74994.546770] RBP: ffffb346805d0ec0 R08: 00004431998c84b0 R09: 000044319967efcb
Mar 11 21:23:01 proxmox kernel: [74994.546771] R10: ffffffffa92060c0 R11: 000000000000036f R12: 0000000000000082
Mar 11 21:23:01 proxmox kernel: [74994.546771] R13: dead000000000122 R14: 0000000000000001 R15: ffff8fb01ee61a40
Mar 11 21:23:01 proxmox kernel: [74994.546772] FS: 0000000000000000(0000) GS:ffff8fb01ee40000(0000) knlGS:0000000000000000
Mar 11 21:23:01 proxmox kernel: [74994.546773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 11 21:23:01 proxmox kernel: [74994.546774] CR2: ffffd58784c55408 CR3: 000000011232c000 CR4: 0000000000350ee0
Mar 11 21:23:01 proxmox kernel: [74994.546775] Call Trace:
Mar 11 21:23:01 proxmox kernel: [74994.546776] <IRQ>
Mar 11 21:23:01 proxmox kernel: [74994.546778] _raw_spin_lock_irq+0x2a/0x40
Mar 11 21:23:01 proxmox kernel: [74994.546781] __run_timers.part.0+0x32/0x270
Mar 11 21:23:01 proxmox kernel: [74994.546783] ? recalibrate_cpu_khz+0x10/0x10
Mar 11 21:23:01 proxmox kernel: [74994.546785] ? ktime_get+0x46/0xc0
Mar 11 21:23:01 proxmox kernel: [74994.546787] ? native_x2apic_icr_read+0x20/0x20
Mar 11 21:23:01 proxmox kernel: [74994.546788] ? lapic_next_event+0x21/0x30
Mar 11 21:23:01 proxmox kernel: [74994.546790] ? clockevents_program_event+0xab/0x130
Mar 11 21:23:01 proxmox kernel: [74994.546793] run_timer_softirq+0x4b/0x60
Mar 11 21:23:01 proxmox kernel: [74994.546793] __do_softirq+0xd9/0x2ea
Mar 11 21:23:01 proxmox kernel: [74994.546795] irq_exit_rcu+0x94/0xc0
Mar 11 21:23:01 proxmox kernel: [74994.546797] sysvec_apic_timer_interrupt+0x80/0x90
Mar 11 21:23:01 proxmox kernel: [74994.546799] </IRQ>
Mar 11 21:23:01 proxmox kernel: [74994.546799] <TASK>
Mar 11 21:23:01 proxmox kernel: [74994.546800] asm_sysvec_apic_timer_interrupt+0x1b/0x20
Mar 11 21:23:01 proxmox kernel: [74994.546801] RIP: 0010:native_safe_halt+0xb/0x10
Mar 11 21:23:01 proxmox kernel: [74994.546803] Code: ff ff 4c 89 ee 48 c7 c7 e0 45 25 a9 e8 be 52 8f ff e9 46 ff ff ff cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d 69 d8 47 00 fb f4 <e9> 00 1f 27 00 eb 07 0f 00 2d 59 d8 47 00 f4 e9 f1 1e 27 00 cc 0f
Mar 11 21:23:01 proxmox kernel: [74994.546804] RSP: 0018:ffffb346801e7de0 EFLAGS: 00000246
Mar 11 21:23:01 proxmox kernel: [74994.546805] RAX: 0000000000004000 RBX: 000000000002dec8 RCX: 0000000000000000
Mar 11 21:23:01 proxmox kernel: [74994.546805] RDX: ffff8fb01ee40000 RSI: ffff8fa9019bd400 RDI: ffff8fa9019bd464
Mar 11 21:23:01 proxmox kernel: [74994.546806] RBP: ffffb346801e7de8 R08: 00004431a90fa112 R09: 0000000000000000
Mar 11 21:23:01 proxmox kernel: [74994.546806] R10: 0000000000000002 R11: 071c71c71c71c71c R12: 0000000000000001
Mar 11 21:23:01 proxmox kernel: [74994.546807] R13: 0000000000000011 R14: ffff8fa9019bd464 R15: ffffffffa94e6ec0
Mar 11 21:23:01 proxmox kernel: [74994.546809] ? acpi_idle_do_entry+0x53/0x70
Mar 11 21:23:01 proxmox kernel: [74994.546811] acpi_idle_enter+0xc0/0x160
Mar 11 21:23:01 proxmox kernel: [74994.546812] cpuidle_enter_state+0x9a/0x620
Mar 11 21:23:01 proxmox kernel: [74994.546816] cpuidle_enter+0x2e/0x50
Mar 11 21:23:01 proxmox kernel: [74994.546817] do_idle+0x20d/0x2b0
Mar 11 21:23:01 proxmox kernel: [74994.546819] cpu_startup_entry+0x20/0x30
Mar 11 21:23:01 proxmox kernel: [74994.546821] start_secondary+0x12a/0x180
Mar 11 21:23:01 proxmox kernel: [74994.546822] secondary_startup_64_no_verify+0xc2/0xcb
Mar 11 21:23:01 proxmox kernel: [74994.546825] </TASK>
Mar 11 21:23:01 proxmox kernel: [75040.129389] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Mar 11 21:23:01 proxmox kernel: [75040.129401] rcu: 17-...0: (6 ticks this GP) idle=607/0/0x1 softirq=932436/932438 fqs=6026
Mar 11 21:23:01 proxmox kernel: [75040.129406] (detected by 8, t=15002 jiffies, g=1693097, q=4367)
Mar 11 21:23:01 proxmox kernel: [75040.129409] Sending NMI from CPU 8 to CPUs 17:
Mar 11 21:23:01 proxmox kernel: [75040.129413] NMI backtrace for cpu 17
Mar 11 21:23:01 proxmox kernel: [75040.129416] CPU: 17 PID: 0 Comm: swapper/17 Tainted: P O 5.15.85-1-pve #1
Mar 11 21:23:01 proxmox kernel: [75040.129418] Hardware name: To Be Filled By O.E.M. X570 Taichi/X570 Taichi, BIOS P5.01 01/18/2023
Mar 11 21:23:01 proxmox kernel: [75040.129420] RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x240
Mar 11 21:23:01 proxmox kernel: [75040.129426] Code: 2b 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 03 30 e4 09 d0 a9 00 01 ff ff 0f 85 13 01 00 00 85 c0 74 0e 8b 03 84 c0 74 08 f3 90 <8b> 03 84 c0 75 f8 b8 01 00 00 00 66 89 03 5b 41 5c 41 5d 41 5e 41
Mar 11 21:23:01 proxmox kernel: [75040.129427] RSP: 0018:ffffb346805d0e98 EFLAGS: 00000082
Mar 11 21:23:01 proxmox kernel: [75040.129429] RAX: 0000000000000180 RBX: ffff8fb01ee61a40 RCX: 0000000000000020
Mar 11 21:23:01 proxmox kernel: [75040.129431] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8fb01ee61a40
Mar 11 21:23:01 proxmox kernel: [75040.129431] RBP: ffffb346805d0ec0 R08: 00004431998c84b0 R09: 000044319967efcb
Mar 11 21:23:01 proxmox kernel: [75040.129433] R10: ffffffffa92060c0 R11: 000000000000036f R12: 0000000000000082
Mar 11 21:23:01 proxmox kernel: [75040.129434] R13: dead000000000122 R14: 0000000000000001 R15: ffff8fb01ee61a40
Mar 11 21:23:01 proxmox kernel: [75040.129435] FS: 0000000000000000(0000) GS:ffff8fb01ee40000(0000) knlGS:0000000000000000
Mar 11 21:23:01 proxmox kernel: [75040.129436] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 11 21:23:01 proxmox kernel: [75040.129437] CR2: ffffd58784c55408 CR3: 000000011232c000 CR4: 0000000000350ee0
Mar 11 21:23:01 proxmox kernel: [75040.129439] Call Trace:
Mar 11 21:23:01 proxmox kernel: [75040.129440] <IRQ>
Mar 11 21:23:01 proxmox kernel: [75040.129442] _raw_spin_lock_irq+0x2a/0x40
Mar 11 21:23:01 proxmox kernel: [75040.129445] __run_timers.part.0+0x32/0x270
Mar 11 21:23:01 proxmox kernel: [75040.129447] ? recalibrate_cpu_khz+0x10/0x10
Mar 11 21:23:01 proxmox kernel: [75040.129450] ? ktime_get+0x46/0xc0
Mar 11 21:23:01 proxmox kernel: [75040.129451] ? native_x2apic_icr_read+0x20/0x20
Mar 11 21:23:01 proxmox kernel: [75040.129453] ? lapic_next_event+0x21/0x30
Mar 11 21:23:01 proxmox kernel: [75040.129456] ? clockevents_program_event+0xab/0x130
Mar 11 21:23:01 proxmox kernel: [75040.129458] run_timer_softirq+0x4b/0x60
Mar 11 21:23:01 proxmox kernel: [75040.129459] __do_softirq+0xd9/0x2ea
Mar 11 21:23:01 proxmox kernel: [75040.129461] irq_exit_rcu+0x94/0xc0
Mar 11 21:23:01 proxmox kernel: [75040.129463] sysvec_apic_timer_interrupt+0x80/0x90
Mar 11 21:23:01 proxmox kernel: [75040.129466] </IRQ>
Mar 11 21:23:01 proxmox kernel: [75040.129466] <TASK>
Mar 11 21:23:01 proxmox kernel: [75040.129467] asm_sysvec_apic_timer_interrupt+0x1b/0x20
Mar 11 21:23:01 proxmox kernel: [75040.129468] RIP: 0010:native_safe_halt+0xb/0x10
Mar 11 21:23:01 proxmox kernel: [75040.129470] Code: ff ff 4c 89 ee 48 c7 c7 e0 45 25 a9 e8 be 52 8f ff e9 46 ff ff ff cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d 69 d8 47 00 fb f4 <e9> 00 1f 27 00 eb 07 0f 00 2d 59 d8 47 00 f4 e9 f1 1e 27 00 cc 0f
Mar 11 21:23:01 proxmox kernel: [75040.129471] RSP: 0018:ffffb346801e7de0 EFLAGS: 00000246
Mar 11 21:23:01 proxmox kernel: [75040.129472] RAX: 0000000000004000 RBX: 000000000002dec8 RCX: 0000000000000000
Mar 11 21:23:01 proxmox kernel: [75040.129473] RDX: ffff8fb01ee40000 RSI: ffff8fa9019bd400 RDI: ffff8fa9019bd464
Mar 11 21:23:01 proxmox kernel: [75040.129474] RBP: ffffb346801e7de8 R08: 00004431a90fa112 R09: 0000000000000000
Mar 11 21:23:01 proxmox kernel: [75040.129475] R10: 0000000000000002 R11: 071c71c71c71c71c R12: 0000000000000001
Mar 11 21:23:01 proxmox kernel: [75040.129475] R13: 0000000000000011 R14: ffff8fa9019bd464 R15: ffffffffa94e6ec0
Mar 11 21:23:01 proxmox kernel: [75040.129478] ? acpi_idle_do_entry+0x53/0x70
Mar 11 21:23:01 proxmox kernel: [75040.129480] acpi_idle_enter+0xc0/0x160
Mar 11 21:23:01 proxmox kernel: [75040.129482] cpuidle_enter_state+0x9a/0x620
Mar 11 21:23:01 proxmox kernel: [75040.129485] cpuidle_enter+0x2e/0x50
Mar 11 21:23:01 proxmox kernel: [75040.129487] do_idle+0x20d/0x2b0
Mar 11 21:23:01 proxmox kernel: [75040.129489] cpu_startup_entry+0x20/0x30
Mar 11 21:23:01 proxmox kernel: [75040.129490] start_secondary+0x12a/0x180
Mar 11 21:23:01 proxmox kernel: [75040.129492] secondary_startup_64_no_verify+0xc2/0xcb
Mar 11 21:23:01 proxmox kernel: [75040.129496] </TASK>
Code:
proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-helper: 7.3-6
pve-kernel-5.15: 7.3-2
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-6
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-3
pve-qemu-kvm: 7.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
Code:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 113
Model name: AMD Ryzen 9 3950X 16-Core Processor
Stepping: 0
Frequency boost: enabled
CPU MHz: 3500.000
CPU max MHz: 4761.2300
CPU min MHz: 2200.0000
BogoMIPS: 6999.55
Virtualization: AMD-V
L1d cache: 512 KiB
L1i cache: 512 KiB
L2 cache: 8 MiB
L3 cache: 64 MiB
NUMA node0 CPU(s): 0-31
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmx
ext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulq
dq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapi
c cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perf
ctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdsee
d adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeass
ists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev se
v_es
Last edited: