NMI watchdog: Watchdog detected hard LOCKUP

John_Dong

New Member
Mar 12, 2023
1
0
1
I continue to get this hard lockup on CPU 17, what should I do to debug this issue?
Code:
 kernel:[74994.546705] NMI watchdog: Watchdog detected hard LOCKUP on cpu 17
Mar 11 21:23:01 proxmox kernel: [74994.546705] NMI watchdog: Watchdog detected hard LOCKUP on cpu 17
Mar 11 21:23:01 proxmox kernel: [74994.546708] Modules linked in: veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common zfs(PO) zunicode(PO) zzstd(O) iwlmvm zlua(O) zavl(PO) mac80211 icp(PO) edac_mce_amd libarc4 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi kvm_amd snd_hda_intel zcommon(PO) znvpair(PO) kvm btusb snd_intel_dspcfg spl(O) snd_intel_sdw_acpi btrtl btbcm crct10dif_pclmul vhost_net iwlwifi ghash_clmulni_intel snd_hda_codec vhost btintel vhost_iotlb aesni_intel tap joydev snd_hda_core bluetooth crypto_simd ib_iser input_leds cfg80211 snd_hwdep cryptd ecdh_generic rdma_cm snd_pcm ecc iw_cm rapl snd_timer snd mxm_wmi efi_pstore soundcore wmi_bmof pcspkr ccp ib_cm k10temp ib_core mac_hid iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid vfio_pci vfio_pci_core vfio_virqfd irqbypass vfio_iommu_type1 vfio drm sunrpc
Mar 11 21:23:01 proxmox kernel: [74994.546745]  ip_tables x_tables autofs4 hid_logitech_hidpp btrfs blake2b_generic xor zstd_compress hid_logitech_dj hid_generic usbkbd usbmouse usbhid hid raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xhci_pci ahci xhci_pci_renesas crc32_pclmul i2c_piix4 igb libahci xhci_hcd i2c_algo_bit dca nvme nvme_core wmi
Mar 11 21:23:01 proxmox kernel: [74994.546758] CPU: 17 PID: 0 Comm: swapper/17 Tainted: P           O      5.15.85-1-pve #1
Mar 11 21:23:01 proxmox kernel: [74994.546760] Hardware name: To Be Filled By O.E.M. X570 Taichi/X570 Taichi, BIOS P5.01 01/18/2023
Mar 11 21:23:01 proxmox kernel: [74994.546761] RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x240
Mar 11 21:23:01 proxmox kernel: [74994.546766] Code: 2b 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 03 30 e4 09 d0 a9 00 01 ff ff 0f 85 13 01 00 00 85 c0 74 0e 8b 03 84 c0 74 08 f3 90 <8b> 03 84 c0 75 f8 b8 01 00 00 00 66 89 03 5b 41 5c 41 5d 41 5e 41
Mar 11 21:23:01 proxmox kernel: [74994.546767] RSP: 0018:ffffb346805d0e98 EFLAGS: 00000082
Mar 11 21:23:01 proxmox kernel: [74994.546768] RAX: 0000000000000180 RBX: ffff8fb01ee61a40 RCX: 0000000000000020
Mar 11 21:23:01 proxmox kernel: [74994.546769] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8fb01ee61a40
Mar 11 21:23:01 proxmox kernel: [74994.546770] RBP: ffffb346805d0ec0 R08: 00004431998c84b0 R09: 000044319967efcb
Mar 11 21:23:01 proxmox kernel: [74994.546771] R10: ffffffffa92060c0 R11: 000000000000036f R12: 0000000000000082
Mar 11 21:23:01 proxmox kernel: [74994.546771] R13: dead000000000122 R14: 0000000000000001 R15: ffff8fb01ee61a40
Mar 11 21:23:01 proxmox kernel: [74994.546772] FS:  0000000000000000(0000) GS:ffff8fb01ee40000(0000) knlGS:0000000000000000
Mar 11 21:23:01 proxmox kernel: [74994.546773] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 11 21:23:01 proxmox kernel: [74994.546774] CR2: ffffd58784c55408 CR3: 000000011232c000 CR4: 0000000000350ee0
Mar 11 21:23:01 proxmox kernel: [74994.546775] Call Trace:
Mar 11 21:23:01 proxmox kernel: [74994.546776]  <IRQ>
Mar 11 21:23:01 proxmox kernel: [74994.546778]  _raw_spin_lock_irq+0x2a/0x40
Mar 11 21:23:01 proxmox kernel: [74994.546781]  __run_timers.part.0+0x32/0x270
Mar 11 21:23:01 proxmox kernel: [74994.546783]  ? recalibrate_cpu_khz+0x10/0x10
Mar 11 21:23:01 proxmox kernel: [74994.546785]  ? ktime_get+0x46/0xc0
Mar 11 21:23:01 proxmox kernel: [74994.546787]  ? native_x2apic_icr_read+0x20/0x20
Mar 11 21:23:01 proxmox kernel: [74994.546788]  ? lapic_next_event+0x21/0x30
Mar 11 21:23:01 proxmox kernel: [74994.546790]  ? clockevents_program_event+0xab/0x130
Mar 11 21:23:01 proxmox kernel: [74994.546793]  run_timer_softirq+0x4b/0x60
Mar 11 21:23:01 proxmox kernel: [74994.546793]  __do_softirq+0xd9/0x2ea
Mar 11 21:23:01 proxmox kernel: [74994.546795]  irq_exit_rcu+0x94/0xc0
Mar 11 21:23:01 proxmox kernel: [74994.546797]  sysvec_apic_timer_interrupt+0x80/0x90
Mar 11 21:23:01 proxmox kernel: [74994.546799]  </IRQ>
Mar 11 21:23:01 proxmox kernel: [74994.546799]  <TASK>
Mar 11 21:23:01 proxmox kernel: [74994.546800]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
Mar 11 21:23:01 proxmox kernel: [74994.546801] RIP: 0010:native_safe_halt+0xb/0x10
Mar 11 21:23:01 proxmox kernel: [74994.546803] Code: ff ff 4c 89 ee 48 c7 c7 e0 45 25 a9 e8 be 52 8f ff e9 46 ff ff ff cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d 69 d8 47 00 fb f4 <e9> 00 1f 27 00 eb 07 0f 00 2d 59 d8 47 00 f4 e9 f1 1e 27 00 cc 0f
Mar 11 21:23:01 proxmox kernel: [74994.546804] RSP: 0018:ffffb346801e7de0 EFLAGS: 00000246
Mar 11 21:23:01 proxmox kernel: [74994.546805] RAX: 0000000000004000 RBX: 000000000002dec8 RCX: 0000000000000000
Mar 11 21:23:01 proxmox kernel: [74994.546805] RDX: ffff8fb01ee40000 RSI: ffff8fa9019bd400 RDI: ffff8fa9019bd464
Mar 11 21:23:01 proxmox kernel: [74994.546806] RBP: ffffb346801e7de8 R08: 00004431a90fa112 R09: 0000000000000000
Mar 11 21:23:01 proxmox kernel: [74994.546806] R10: 0000000000000002 R11: 071c71c71c71c71c R12: 0000000000000001
Mar 11 21:23:01 proxmox kernel: [74994.546807] R13: 0000000000000011 R14: ffff8fa9019bd464 R15: ffffffffa94e6ec0
Mar 11 21:23:01 proxmox kernel: [74994.546809]  ? acpi_idle_do_entry+0x53/0x70
Mar 11 21:23:01 proxmox kernel: [74994.546811]  acpi_idle_enter+0xc0/0x160
Mar 11 21:23:01 proxmox kernel: [74994.546812]  cpuidle_enter_state+0x9a/0x620
Mar 11 21:23:01 proxmox kernel: [74994.546816]  cpuidle_enter+0x2e/0x50
Mar 11 21:23:01 proxmox kernel: [74994.546817]  do_idle+0x20d/0x2b0
Mar 11 21:23:01 proxmox kernel: [74994.546819]  cpu_startup_entry+0x20/0x30
Mar 11 21:23:01 proxmox kernel: [74994.546821]  start_secondary+0x12a/0x180
Mar 11 21:23:01 proxmox kernel: [74994.546822]  secondary_startup_64_no_verify+0xc2/0xcb
Mar 11 21:23:01 proxmox kernel: [74994.546825]  </TASK>
Mar 11 21:23:01 proxmox kernel: [75040.129389] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Mar 11 21:23:01 proxmox kernel: [75040.129401] rcu:     17-...0: (6 ticks this GP) idle=607/0/0x1 softirq=932436/932438 fqs=6026
Mar 11 21:23:01 proxmox kernel: [75040.129406]  (detected by 8, t=15002 jiffies, g=1693097, q=4367)
Mar 11 21:23:01 proxmox kernel: [75040.129409] Sending NMI from CPU 8 to CPUs 17:
Mar 11 21:23:01 proxmox kernel: [75040.129413] NMI backtrace for cpu 17
Mar 11 21:23:01 proxmox kernel: [75040.129416] CPU: 17 PID: 0 Comm: swapper/17 Tainted: P           O      5.15.85-1-pve #1
Mar 11 21:23:01 proxmox kernel: [75040.129418] Hardware name: To Be Filled By O.E.M. X570 Taichi/X570 Taichi, BIOS P5.01 01/18/2023
Mar 11 21:23:01 proxmox kernel: [75040.129420] RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x240
Mar 11 21:23:01 proxmox kernel: [75040.129426] Code: 2b 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 03 30 e4 09 d0 a9 00 01 ff ff 0f 85 13 01 00 00 85 c0 74 0e 8b 03 84 c0 74 08 f3 90 <8b> 03 84 c0 75 f8 b8 01 00 00 00 66 89 03 5b 41 5c 41 5d 41 5e 41
Mar 11 21:23:01 proxmox kernel: [75040.129427] RSP: 0018:ffffb346805d0e98 EFLAGS: 00000082
Mar 11 21:23:01 proxmox kernel: [75040.129429] RAX: 0000000000000180 RBX: ffff8fb01ee61a40 RCX: 0000000000000020
Mar 11 21:23:01 proxmox kernel: [75040.129431] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8fb01ee61a40
Mar 11 21:23:01 proxmox kernel: [75040.129431] RBP: ffffb346805d0ec0 R08: 00004431998c84b0 R09: 000044319967efcb
Mar 11 21:23:01 proxmox kernel: [75040.129433] R10: ffffffffa92060c0 R11: 000000000000036f R12: 0000000000000082
Mar 11 21:23:01 proxmox kernel: [75040.129434] R13: dead000000000122 R14: 0000000000000001 R15: ffff8fb01ee61a40
Mar 11 21:23:01 proxmox kernel: [75040.129435] FS:  0000000000000000(0000) GS:ffff8fb01ee40000(0000) knlGS:0000000000000000
Mar 11 21:23:01 proxmox kernel: [75040.129436] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 11 21:23:01 proxmox kernel: [75040.129437] CR2: ffffd58784c55408 CR3: 000000011232c000 CR4: 0000000000350ee0
Mar 11 21:23:01 proxmox kernel: [75040.129439] Call Trace:
Mar 11 21:23:01 proxmox kernel: [75040.129440]  <IRQ>
Mar 11 21:23:01 proxmox kernel: [75040.129442]  _raw_spin_lock_irq+0x2a/0x40
Mar 11 21:23:01 proxmox kernel: [75040.129445]  __run_timers.part.0+0x32/0x270
Mar 11 21:23:01 proxmox kernel: [75040.129447]  ? recalibrate_cpu_khz+0x10/0x10
Mar 11 21:23:01 proxmox kernel: [75040.129450]  ? ktime_get+0x46/0xc0
Mar 11 21:23:01 proxmox kernel: [75040.129451]  ? native_x2apic_icr_read+0x20/0x20
Mar 11 21:23:01 proxmox kernel: [75040.129453]  ? lapic_next_event+0x21/0x30
Mar 11 21:23:01 proxmox kernel: [75040.129456]  ? clockevents_program_event+0xab/0x130
Mar 11 21:23:01 proxmox kernel: [75040.129458]  run_timer_softirq+0x4b/0x60
Mar 11 21:23:01 proxmox kernel: [75040.129459]  __do_softirq+0xd9/0x2ea
Mar 11 21:23:01 proxmox kernel: [75040.129461]  irq_exit_rcu+0x94/0xc0
Mar 11 21:23:01 proxmox kernel: [75040.129463]  sysvec_apic_timer_interrupt+0x80/0x90
Mar 11 21:23:01 proxmox kernel: [75040.129466]  </IRQ>
Mar 11 21:23:01 proxmox kernel: [75040.129466]  <TASK>
Mar 11 21:23:01 proxmox kernel: [75040.129467]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
Mar 11 21:23:01 proxmox kernel: [75040.129468] RIP: 0010:native_safe_halt+0xb/0x10
Mar 11 21:23:01 proxmox kernel: [75040.129470] Code: ff ff 4c 89 ee 48 c7 c7 e0 45 25 a9 e8 be 52 8f ff e9 46 ff ff ff cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d 69 d8 47 00 fb f4 <e9> 00 1f 27 00 eb 07 0f 00 2d 59 d8 47 00 f4 e9 f1 1e 27 00 cc 0f
Mar 11 21:23:01 proxmox kernel: [75040.129471] RSP: 0018:ffffb346801e7de0 EFLAGS: 00000246
Mar 11 21:23:01 proxmox kernel: [75040.129472] RAX: 0000000000004000 RBX: 000000000002dec8 RCX: 0000000000000000
Mar 11 21:23:01 proxmox kernel: [75040.129473] RDX: ffff8fb01ee40000 RSI: ffff8fa9019bd400 RDI: ffff8fa9019bd464
Mar 11 21:23:01 proxmox kernel: [75040.129474] RBP: ffffb346801e7de8 R08: 00004431a90fa112 R09: 0000000000000000
Mar 11 21:23:01 proxmox kernel: [75040.129475] R10: 0000000000000002 R11: 071c71c71c71c71c R12: 0000000000000001
Mar 11 21:23:01 proxmox kernel: [75040.129475] R13: 0000000000000011 R14: ffff8fa9019bd464 R15: ffffffffa94e6ec0
Mar 11 21:23:01 proxmox kernel: [75040.129478]  ? acpi_idle_do_entry+0x53/0x70
Mar 11 21:23:01 proxmox kernel: [75040.129480]  acpi_idle_enter+0xc0/0x160
Mar 11 21:23:01 proxmox kernel: [75040.129482]  cpuidle_enter_state+0x9a/0x620
Mar 11 21:23:01 proxmox kernel: [75040.129485]  cpuidle_enter+0x2e/0x50
Mar 11 21:23:01 proxmox kernel: [75040.129487]  do_idle+0x20d/0x2b0
Mar 11 21:23:01 proxmox kernel: [75040.129489]  cpu_startup_entry+0x20/0x30
Mar 11 21:23:01 proxmox kernel: [75040.129490]  start_secondary+0x12a/0x180
Mar 11 21:23:01 proxmox kernel: [75040.129492]  secondary_startup_64_no_verify+0xc2/0xcb
Mar 11 21:23:01 proxmox kernel: [75040.129496]  </TASK>
pveversion -v
Code:
proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-helper: 7.3-6
pve-kernel-5.15: 7.3-2
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-6
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-3
pve-qemu-kvm: 7.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
lscpu
Code:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          32
On-line CPU(s) list:             0-31
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           113
Model name:                      AMD Ryzen 9 3950X 16-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         3500.000
CPU max MHz:                     4761.2300
CPU min MHz:                     2200.0000
BogoMIPS:                        6999.55
Virtualization:                  AMD-V
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        8 MiB
L3 cache:                        64 MiB
NUMA node0 CPU(s):               0-31
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmx
                                 ext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulq
                                 dq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapi
                                 c cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perf
                                 ctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdsee
                                 d adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
                                 clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeass
                                 ists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev se
                                 v_es
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!