BUG "soft lockup" - Dead PVE.

venquessa · Sep 8, 2023

So, this morning I got up to no network. Assigned up a few statics and couldn't ping the PVE node at all.

I hit the shutdown button and waited 5 minutes while getting coffee. Nothing. Held the power button for 5 seconds.

PVE booted, apparently looking like nothing happened.

Then I found the below repeated about a dozen times, starting around 1am and ending at around 2am.

The root cause, I believe is the NFS share install yesterday for the offline backup server. It is "WOL" and spends 99% of the time shutdown.

I have a hookscript which ends by disabling the Storage and unmounting the share before shutting down the backup server. It would seem that "telegraf" was pinging that share all night long, even though it's not mounted! This has almost certainly causes a whole collection of "Dead" threads stuck in the pergatory of IOWait on an NFS share that will never return. A known linux bug they will not fix because they say it's better that a server die in wait of the share to return rather than loose data.

This is the first time I have seen that symptom errupt into a full on kernel PANIC. Then again... it's the first time I have used a "customized tainted" kernel.

Sep 08 02:13:51 pve kernel: nfs: server backups not responding, timed out
Sep 08 02:13:55 pve telegraf[7741]: 2023-09-08T01:13:55Z E! [outputs.influxdb] When writing to [http://influxdb:8087]: failed doing req: Post "http://influxdb:8087/write?db=telegraf": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Sep 08 02:13:55 pve telegraf[7741]: 2023-09-08T01:13:55Z E! [agent] Error writing to outputs.influxdb: could not write any address
Sep 08 02:14:17 pve kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 678s! [kworker/7:1H:174]
Sep 08 02:14:17 pve kernel: Modules linked in: nft_masq nft_limit nft_reject_inet nft_reject nft_ct mptcp_diag sctp_diag udp_diag raw_diag unix_diag xt_NFLOG xt_limit tcp_diag inet_diag xt_nat nfsv4 nfs fscache netfs nft_chain_nat xt_MASQUERADE nf_nat xfrm_user xfrm_algo nft_compat overlay rpcsec_gss_krb5 veth ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac ipt_REJECT nf_reject_ipv4 xt_mark xt_physdev xt_addrtype xt_comment xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel nf_tables 8021q garp mrp softdog binfmt_misc bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common amdgpu edac_mce_amd iommu_v2 drm_buddy snd_hda_codec_hdmi kvm_amd gpu_sched drm_ttm_helper ttm snd_hda_intel drm_display_helper snd_intel_dspcfg snd_intel_sdw_acpi kvm cec snd_hda_codec rc_core irqbypass snd_hda_core drm_kms_helper crct10dif_pclmul
Sep 08 02:14:17 pve kernel: polyval_clmulni snd_hwdep polyval_generic ghash_clmulni_intel eeepc_wmi sha512_ssse3 aesni_intel snd_pcm asus_wmi crypto_simd cryptd snd_timer i2c_algo_bit ledtrig_audio sparse_keymap syscopyarea platform_profile snd sysfillrect rapl sysimgblt pcspkr xr_serial video soundcore cdc_acm wmi_bmof ccp k10temp ch341 usbserial mac_hid nfsd auth_rpcgss nfs_acl vhost_net lockd vhost vhost_iotlb grace tap drm efi_pstore sunrpc dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear simplefb xhci_pci nvme xhci_pci_renesas crc32_pclmul i2c_piix4 nvme_core r8169 ahci nvme_common realtek xhci_hcd libahci wmi gpio_amdpt
Sep 08 02:14:17 pve kernel: CPU: 7 PID: 174 Comm: kworker/7:1H Tainted: P O L 6.2.16-3-pve #1
Sep 08 02:14:17 pve kernel: Hardware name: ASUS System Product Name/PRIME B550-PLUS, BIOS 3202 06/15/2023
Sep 08 02:14:17 pve kernel: Workqueue: kblockd blk_mq_timeout_work
Sep 08 02:14:17 pve kernel: RIP: 0010:__synchronize_hardirq+0x42/0xd0
Sep 08 02:14:17 pve kernel: Code: 41 89 f4 53 48 8d 9f a4 00 00 00 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 c6 45 cf 00 eb 02 f3 90 49 8b 46 38 <8b> 00 a9 00 00 04 00 75 f1 48 89 df e8 ad ed ef 00 49 89 c7 49 8b
Sep 08 02:14:17 pve kernel: RSP: 0018:ffffb0580102fbb0 EFLAGS: 00000206
Sep 08 02:14:17 pve kernel: RAX: ffff942797075a00 RBX: ffff942793075aa4 RCX: 0000000000000000
Sep 08 02:14:17 pve kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff942793075a00
Sep 08 02:14:17 pve kernel: RBP: ffffb0580102fbe8 R08: 0000000000000000 R09: 0000000000000000
Sep 08 02:14:17 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Sep 08 02:14:17 pve kernel: R13: ffff942793075a28 R14: ffff942793075a00 R15: 0000000000000001
Sep 08 02:14:17 pve kernel: FS: 0000000000000000(0000) GS:ffff943e2e5c0000(0000) knlGS:0000000000000000
Sep 08 02:14:17 pve kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 08 02:14:17 pve kernel: CR2: 000000c0000c5010 CR3: 00000016e3010000 CR4: 0000000000750ee0
Sep 08 02:14:17 pve kernel: PKRU: 55555554
Sep 08 02:14:17 pve kernel: Call Trace:
Sep 08 02:14:17 pve kernel: <TASK>
Sep 08 02:14:17 pve kernel: synchronize_irq+0x39/0xf0
Sep 08 02:14:17 pve kernel: ? __irq_put_desc_unlock+0x1e/0x50
Sep 08 02:14:17 pve kernel: ? __disable_irq_nosync+0x64/0xa0
Sep 08 02:14:17 pve kernel: disable_irq+0x2a/0x40
Sep 08 02:14:17 pve kernel: nvme_poll_irqdisable+0x3b/0x70 [nvme]
Sep 08 02:14:17 pve kernel: nvme_timeout+0xe5/0x3c0 [nvme]
Sep 08 02:14:17 pve kernel: blk_mq_handle_expired+0x7f/0xd0
Sep 08 02:14:17 pve kernel: bt_iter+0xb1/0xc0
Sep 08 02:14:17 pve kernel: blk_mq_queue_tag_busy_iter+0x3c1/0x700
Sep 08 02:14:17 pve kernel: ? __pfx_blk_mq_handle_expired+0x10/0x10
Sep 08 02:14:17 pve kernel: ? __pfx_blk_mq_handle_expired+0x10/0x10
Sep 08 02:14:17 pve kernel: blk_mq_timeout_work+0x184/0x1d0
Sep 08 02:14:17 pve kernel: process_one_work+0x225/0x430
Sep 08 02:14:17 pve kernel: worker_thread+0x50/0x3e0
Sep 08 02:14:17 pve kernel: ? __pfx_worker_thread+0x10/0x10
Sep 08 02:14:17 pve kernel: kthread+0xe9/0x110
Sep 08 02:14:17 pve kernel: ? __pfx_kthread+0x10/0x10
Sep 08 02:14:17 pve kernel: ret_from_fork+0x2c/0x50
Sep 08 02:14:17 pve kernel: </TASK>

venquessa · Sep 8, 2023

Sep 08 02:14:17 pve kernel: nvme_poll_irqdisable+0x3b/0x70 [nvme]
Sep 08 02:14:17 pve kernel: nvme_timeout+0xe5/0x3c0 [nvme]

These reference the NVME module. That worries me.

Stuff like: Sep 08 02:14:17 pve kernel: Workqueue: kblockd blk_mq_timeout_work
and
Sep 08 02:14:17 pve kernel: RIP: 0010:__synchronize_hardirq+0x42/0xd0

Fit with the NFS hard IRQ thread being killed, but the NVME reference worries me now.

The kernel trace above was the last gasp of the server before total lock up. Is it possible that the NFS hard IRQs swamping the IO threads ended up causing a kernel timeout on the rootfs NVME device and that is why this was the last gasp report?

BUG "soft lockup" - Dead PVE.

venquessa

New Member

venquessa

New Member

We value your privacy