Proxmox 7 watchdog: Hard LOCKUP CPU

haiopaiii

New Member
Jul 3, 2022
1
1
3
Hey guys,

I´m a bit clueless. My dedicated server keeps crashing from time to time. Today 2 times.
It always hard locks CPUs.

CPU: 16 x Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz (1 Socket)

Running ZFS, 2VMs. 128GB RAM assigned, 256GB available, so it should be no Problem with ZFS.
There was no big CPU or RAM usage.

Can anyone help with some tips?

Bash:
proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-5 (running version: 7.2-5/12f1e639)
pve-kernel-5.15: 7.2-5
pve-kernel-helper: 7.2-5
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.35-3-pve: 5.15.35-6
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-5
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Kernel.log output:

Bash:
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28234.759241] ------------[ cut here ]------------
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775108] NMI watchdog: Watchdog detected hard LOCKUP on cpu 14
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775110] Modules linked in: binfmt_misc ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev x
t_addrtype xt_comment xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip_set_hash_net ip_set nf_tables bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common isst_
if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel ast crypto_simd drm_vram_helper cryptd drm_ttm_helper ttm eeepc_wmi rapl drm_kms_helper asus_wmi
 cec platform_profile intel_cstate rc_core sparse_keymap fb_sys_fops video wmi_bmof pcspkr syscopyarea efi_pstore sysfillrect sysimgblt mei_me mei ioatdma mac_hid acpi_tad vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775143]  autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c simplefb ses enclosure crc32_pclm
ul smartpqi scsi_transport_sas igb nvme xhci_pci xhci_pci_renesas i2c_i801 i2c_algo_bit i2c_smbus dca nvme_core ahci xhci_hcd libahci wmi
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775156] CPU: 14 PID: 1638451 Comm: z_wr_int_2 Tainted: P           O      5.15.35-2-pve #1
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775158] Hardware name: System manufacturer System Product Name/WS C422 DC, BIOS 3205 11/17/2021
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775159] RIP: 0010:native_queued_spin_lock_slowpath+0x75/0x230
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775163] Code: 2b 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 03 30 e4 09 d0 a9 00 01 ff ff 0f 85 0b 01 00 00 85 c0 74 0e 8b 03 84 c0 74 08 f3 90 <8b> 03 84 c0 75 f8 b8 01 00 00 00 66 89 03 5b 41 5c 41
 5d 41 5e 41
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775165] RSP: 0018:ffffa5bff10db628 EFLAGS: 00000002
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775166] RAX: 0000000000000101 RBX: ffff9ad5601b0b40 RCX: 0000000000030b40
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775167] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ad5601b0b40
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775168] RBP: ffffa5bff10db650 R08: 000000000000000e R09: ffff9ad5601b04c0
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775168] R10: ffff9a96c7f74130 R11: ffffffffc0dee0c0 R12: ffff9ad5601b0b40
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775169] R13: 0000000000000000 R14: 0000000000000087 R15: ffff9a96c57c6abc
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775170] FS:  0000000000000000(0000) GS:ffff9ad560180000(0000) knlGS:0000000000000000
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775171] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775171] CR2: 00000000006e82c8 CR3: 0000002454810002 CR4: 00000000003726e0
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775172] Call Trace:
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775173]  <TASK>
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775175]  _raw_spin_lock+0x1e/0x30
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775179]  raw_spin_rq_lock_nested+0x17/0x70
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775181]  try_to_wake_up+0x188/0x5c0
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775184]  wake_up_process+0x15/0x20
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775186]  insert_work+0x71/0x80
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775187]  __queue_work+0x1f7/0x480
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775188]  queue_work_on+0x39/0x60
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775190]  drm_fb_helper_damage.isra.0+0xde/0x100 [drm_kms_helper]
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775205]  drm_fbdev_fb_imageblit+0x46/0x60 [drm_kms_helper]
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775214]  soft_cursor+0x1bd/0x240
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775217]  bit_cursor+0x3e4/0x670
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775218]  ? sprintf+0x56/0x70
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775220]  ? bit_putcs+0x5c0/0x5c0
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775221]  fbcon_cursor+0x149/0x180
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775222]  hide_cursor+0x31/0xd0
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775224]  vt_console_print+0x451/0x4e0
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775225]  console_unlock+0x3af/0x520
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775227]  vprintk_emit+0x154/0x270
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775229]  vprintk_default+0x1d/0x20
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775230]  vprintk+0x58/0x90
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775231]  _printk+0x58/0x6f
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775233]  __warn_printk+0x47/0x89
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775235]  ? sched_clock_cpu+0x12/0xf0
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775236]  update_blocked_averages+0x78c/0x7d0
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775238]  newidle_balance+0x1d7/0x470
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775239]  ? dequeue_task_fair+0x27a/0x390
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775241]  pick_next_task_fair+0x40/0x450
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775243]  __schedule+0x195/0x1750
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775244]  ? __wake_up_common_lock+0x8a/0xc0
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775245]  schedule+0x4e/0xb0
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775247]  taskq_thread+0x3ed/0x4c0 [spl]
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775253]  ? wake_up_q+0x90/0x90
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775255]  ? zio_gang_tree_free+0x70/0x70 [zfs]
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775338]  ? taskq_thread_spawn+0x60/0x60 [spl]
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775344]  kthread+0x127/0x150
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775346]  ? set_kthread_struct+0x50/0x50
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775347]  ret_from_fork+0x1f/0x30
Jul  3 21:15:59 BH173-RZ-HV01 kernel: [28245.775350]  </TASK>
 
Last edited:
  • Like
Reactions: ales

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!