Random Hard Crash of PVE after Kernal Update

Jul 20, 2023
1
0
1
After a recent Kernal Update I have had my entire PVE Serveer crash and had to hard reboot the box. It is a 13900K with 128GB of Ram I have added the Sys log of before and after reboot, any assistance would be appreciated. This PVE has been running for over a month with no issues, until yesterday. Luckily the services currently running on this hardware is LoadBalanced but with the need to expand its uses, i need to find the solution fast. Any help is appreciated. I have attached a more complete syslog


Jul 20 09:14:19 pve3 kernel: Hardware name: To Be Filled By O.E.M. Z690D4U-2L2T/G5/Z690D4U-2L2T/G5, BIOS 10.07 02/14/2023
Jul 20 09:14:19 pve3 kernel: RIP: 0010:smp_call_function_single+0xe7/0x130
Jul 20 09:14:19 pve3 kernel: Code: 75 5e c9 44 89 c0 c3 cc cc cc cc 48 89 e6 4c 89 44 24 10 48 89 54 24 18 e8 26 fe ff ff 41 89 c0 8b 44 24 08 a8 01 74 0a f3 90 <8b> 44 24 08 a8 01 75 f6 eb be 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85
Jul 20 09:14:19 pve3 kernel: RSP: 0018:ffffbeb63731ba80 EFLAGS: 00000202
Jul 20 09:14:19 pve3 kernel: RAX: 0000000000000011 RBX: ffff9abd8baac000 RCX: 0000000000000830
Jul 20 09:14:19 pve3 kernel: RDX: 0000000000030001 RSI: 00000000000008fb RDI: 0000000000000830
Jul 20 09:14:19 pve3 kernel: RBP: ffffbeb63731bac8 R08: 0000000000000000 R09: 000000000000000c
Jul 20 09:14:19 pve3 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000008
Jul 20 09:14:19 pve3 kernel: R13: 0000000000000008 R14: 0000000000000000 R15: ffff9adbff2314c0
Jul 20 09:14:19 pve3 kernel: FS: 00007fef54e10700(0000) GS:ffff9adbff200000(0000) knlGS:0000000000000000
Jul 20 09:14:19 pve3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 20 09:14:19 pve3 kernel: CR2: 00007f8a4a30a000 CR3: 000000038f894000 CR4: 0000000000752ee0
Jul 20 09:14:19 pve3 kernel: PKRU: 55555554
Jul 20 09:14:19 pve3 kernel: Call Trace:
Jul 20 09:14:19 pve3 kernel: <TASK>
Jul 20 09:14:19 pve3 kernel: ? crash_vmclear_local_loaded_vmcss+0x160/0x160 [kvm_intel]
Jul 20 09:14:19 pve3 kernel: vmx_vcpu_load_vmcs+0x15d/0x4e0 [kvm_intel]
Jul 20 09:14:19 pve3 kernel: vmx_vcpu_load+0x19/0x40 [kvm_intel]
Jul 20 09:14:19 pve3 kernel: kvm_arch_vcpu_load+0x48/0x230 [kvm]
Jul 20 09:14:19 pve3 kernel: ? vmx_prepare_switch_to_host+0xf7/0x190 [kvm_intel]
Jul 20 09:14:19 pve3 kernel: kvm_sched_in+0x3d/0x50 [kvm]
Jul 20 09:14:19 pve3 kernel: finish_task_switch.isra.0+0x17f/0x2b0
Jul 20 09:14:19 pve3 kernel: __schedule+0x356/0x1740
Jul 20 09:14:19 pve3 kernel: schedule+0x69/0x110
Jul 20 09:14:19 pve3 kernel: kvm_vcpu_block+0x70/0x3b0 [kvm]
Jul 20 09:14:19 pve3 kernel: kvm_arch_vcpu_ioctl_run+0xa7c/0x16f0 [kvm]
Jul 20 09:14:19 pve3 kernel: ? wake_up_q+0x90/0x90
Jul 20 09:14:19 pve3 kernel: kvm_vcpu_ioctl+0x252/0x6b0 [kvm]
Jul 20 09:14:19 pve3 kernel: ? exit_to_user_mode_prepare+0x37/0x1b0
Jul 20 09:14:19 pve3 kernel: ? __fget_files+0x86/0xc0
Jul 20 09:14:19 pve3 kernel: __x64_sys_ioctl+0x92/0xd0
Jul 20 09:14:19 pve3 kernel: do_syscall_64+0x59/0xc0
Jul 20 09:14:19 pve3 kernel: ? do_syscall_64+0x69/0xc0
Jul 20 09:14:19 pve3 kernel: ? __x64_sys_io_uring_enter+0x29/0x30
Jul 20 09:14:19 pve3 kernel: ? do_syscall_64+0x69/0xc0
Jul 20 09:14:19 pve3 kernel: ? do_syscall_64+0x69/0xc0
Jul 20 09:14:19 pve3 kernel: ? do_syscall_64+0x69/0xc0
Jul 20 09:14:19 pve3 kernel: ? sysvec_reschedule_ipi+0x78/0xe0
Jul 20 09:14:19 pve3 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 20 09:14:19 pve3 kernel: RIP: 0033:0x7fef61c9a237
Jul 20 09:14:19 pve3 kernel: Code: 00 00 00 48 8b 05 59 cc 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 29 cc 0d 00 f7 d8 64 89 01 48
Jul 20 09:14:19 pve3 kernel: RSP: 002b:00007fef54e0b288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jul 20 09:14:19 pve3 kernel: RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fef61c9a237
Jul 20 09:14:19 pve3 kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001a
Jul 20 09:14:19 pve3 kernel: RBP: 000055d01ee98830 R08: 000055d01d6db240 R09: 00007fed400173e0
Jul 20 09:14:19 pve3 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
Jul 20 09:14:19 pve3 kernel: R13: 000055d01dde6020 R14: 0000000000000000 R15: 0000000000000000
Jul 20 09:14:19 pve3 kernel: </TASK>
Jul 20 09:14:23 pve3 kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 413s! [kworker/1:4:305]
Jul 20 09:14:23 pve3 kernel: Modules linked in: snd_hda_codec_hdmi cmac nls_utf8 cifs cifs_arc4 cifs_md4 fscache netfs veth 8021q garp mrp ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables bonding tls softdog nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus coretemp ledtrig_audio snd_soc_core snd_compress i915 ac97_bus snd_pcm_dmaengine ast kvm_intel snd_hda_intel snd_intel_dspcfg drm_vram_helper snd_intel_sdw_acpi drm_ttm_helper ttm snd_hda_codec kvm drm_kms_helper snd_hda_core irqbypass snd_hwdep cec crct10dif_pclmul snd_pcm ghash_clmulni_intel rc_core cdc_ether aesni_intel snd_timer fb_sys_fops
Jul 20 09:14:23 pve3 kernel: mei_hdcp syscopyarea snd usbnet crypto_simd sysfillrect joydev input_leds cryptd mii wmi_bmof efi_pstore sysimgblt pcspkr soundcore mei_me mei zfs(PO) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid zunicode(PO) acpi_pad acpi_tad zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb hid_generic usbkbd usbmouse usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c igb xhci_pci crc32_pclmul i2c_i801 xhci_pci_renesas i2c_algo_bit ahci intel_lpss_pci i2c_smbus i40e dca nvme intel_lpss libahci xhci_hcd idma64 nvme_core wmi video
Jul 20 09:14:23 pve3 kernel: CPU: 1 PID: 305 Comm: kworker/1:4 Tainted: P D W O L 5.15.108-1-pve #1
Jul 20 09:14:23 pve3 kernel: Hardware name: To Be Filled By O.E.M. Z690D4U-2L2T/G5/Z690D4U-2L2T/G5, BIOS 10.07 02/14/2023
Jul 20 09:14:23 pve3 kernel: Workqueue: events netstamp_clear
Jul 20 09:14:23 pve3 kernel: RIP: 0010:smp_call_function_many_cond+0x13c/0x360
Jul 20 09:14:23 pve3 kernel: Code: 02 41 89 c4 73 2d 4d 63 ec 48 8b 13 49 81 fd ff 1f 00 00 0f 87 e3 01 00 00 4a 03 14 ed e0 6a 6c 92 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb bc 48 83 c4 40 5b 41 5c 41 5d 41 5e 41 5f
Jul 20 09:14:23 pve3 kernel: RSP: 0018:ffffbeb600c97cf0 EFLAGS: 00000202
Jul 20 09:14:23 pve3 kernel: RAX: 0000000000000011 RBX: ffff9adbff072500 RCX: 0000000000000003
Jul 20 09:14:23 pve3 kernel: RDX: ffff9adbff0f8ae0 RSI: 0000000000000000 RDI: ffff9abd00067710
Jul 20 09:14:23 pve3 kernel: RBP: ffffbeb600c97d58 R08: 0000000000000000 R09: 0000000000000000
Jul 20 09:14:23 pve3 kernel: R10: 0000000000000003 R11: fffffffffffffff8 R12: 0000000000000003
Jul 20 09:14:23 pve3 kernel: R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000020
Jul 20 09:14:23 pve3 kernel: FS: 0000000000000000(0000) GS:ffff9adbff040000(0000) knlGS:0000000000000000
Jul 20 09:14:23 pve3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 20 09:14:23 pve3 kernel: CR2: 00007f758442e350 CR3: 0000000ec8210000 CR4: 0000000000752ee0
Jul 20 09:14:23 pve3 kernel: PKRU: 55555554
Jul 20 09:14:23 pve3 kernel: Call Trace:
Jul 20 09:14:23 pve3 kernel: <TASK>
Jul 20 09:14:23 pve3 kernel: ? text_poke_loc_init+0x190/0x190
Jul 20 09:14:23 pve3 kernel: on_each_cpu_cond_mask+0x22/0x30
Jul 20 09:14:23 pve3 kernel: text_poke_bp_batch+0xb2/0x270
Jul 20 09:14:23 pve3 kernel: text_poke_finish+0x1f/0x40
Jul 20 09:14:23 pve3 kernel: arch_jump_label_transform_apply+0x1a/0x30
Jul 20 09:14:23 pve3 kernel: __jump_label_update+0xf3/0x140
Jul 20 09:14:23 pve3 kernel: jump_label_update+0xba/0xe0
Jul 20 09:14:23 pve3 kernel: static_key_enable_cpuslocked+0x77/0xa0
Jul 20 09:14:23 pve3 kernel: static_key_enable+0x1b/0x30
Jul 20 09:14:23 pve3 kernel: netstamp_clear+0x2d/0x40
Jul 20 09:14:23 pve3 kernel: process_one_work+0x228/0x3d0
Jul 20 09:14:23 pve3 kernel: worker_thread+0x53/0x420
Jul 20 09:14:23 pve3 kernel: ? process_one_work+0x3d0/0x3d0
Jul 20 09:14:23 pve3 kernel: kthread+0x127/0x150
Jul 20 09:14:23 pve3 kernel: ? set_kthread_struct+0x50/0x50
Jul 20 09:14:23 pve3 kernel: ret_from_fork+0x1f/0x30
Jul 20 09:14:23 pve3 kernel: </TASK>
Jul 20 09:14:35 pve3 kernel: watchdog: BUG: soft lockup - CPU#18 stuck for 336s! [kworker/18:1:334]
Jul 20 09:14:35 pve3 kernel: Modules linked in: snd_hda_codec_hdmi cmac nls_utf8 cifs cifs_arc4 cifs_md4 fscache netfs veth 8021q garp mrp ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables bonding tls softdog nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus coretemp ledtrig_audio snd_soc_core snd_compress i915 ac97_bus snd_pcm_dmaengine ast kvm_intel snd_hda_intel snd_intel_dspcfg drm_vram_helper snd_intel_sdw_acpi drm_ttm_helper ttm snd_hda_codec kvm drm_kms_helper snd_hda_core irqbypass snd_hwdep cec crct10dif_pclmul snd_pcm ghash_clmulni_intel rc_core cdc_ether aesni_intel snd_timer fb_sys_fops
Jul 20 09:14:35 pve3 kernel: mei_hdcp syscopyarea snd usbnet crypto_simd sysfillrect joydev input_leds cryptd mii wmi_bmof efi_pstore sysimgblt pcspkr soundcore mei_me mei zfs(PO) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid zunicode(PO) acpi_pad acpi_tad zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb hid_generic usbkbd usbmouse usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c igb xhci_pci crc32_pclmul i2c_i801 xhci_pci_renesas i2c_algo_bit ahci intel_lpss_pci i2c_smbus i40e dca nvme intel_lpss libahci xhci_hcd idma64 nvme_core wmi video
Jul 20 09:14:35 pve3 kernel: CPU: 18 PID: 334 Comm: kworker/18:1 Tainted: P D W O L 5.15.108-1-pve #1
Jul 20 09:14:35 pve3 kernel: Hardware name: To Be Filled By O.E.M. Z690D4U-2L2T/G5/Z690D4U-2L2T/G5, BIOS 10.07 02/14/2023
Jul 20 09:14:35 pve3 kernel: Workqueue: rcu_gp wait_rcu_exp_gp
Jul 20 09:14:35 pve3 kernel: RIP: 0010:smp_call_function_single+0x94/0x130
Jul 20 09:14:35 pve3 kernel: Code: 30 e9 6e a9 00 01 ff 00 0f 85 9e 00 00 00 85 c9 75 4c 48 c7 c6 80 24 03 00 65 48 03 35 a5 ca e8 6e 8b 46 08 a8 01 74 09 f3 90 <8b> 46 08 a8 01 75 f7 83 4e 08 01 4c 89 46 10 48 89 56 18 e8 54 fe
Jul 20 09:14:35 pve3 kernel: RSP: 0018:ffffbeb600d7fd80 EFLAGS: 00000202
Jul 20 09:14:35 pve3 kernel: RAX: 0000000000000001 RBX: 0000000000000013 RCX: 0000000000000000
Jul 20 09:14:35 pve3 kernel: RDX: 0000000000000000 RSI: ffff9adbff4b2480 RDI: 0000000000000013
Jul 20 09:14:35 pve3 kernel: RBP: ffffbeb600d7fdd0 R08: ffffffff9115ba80 R09: 0000000000000282
Jul 20 09:14:35 pve3 kernel: R10: 0000000000000007 R11: 0000000000000000 R12: ffffffff92f80538
Jul 20 09:14:35 pve3 kernel: R13: ffff9adbff4f22c0 R14: 000000000000ffa6 R15: 0000000000000008
Jul 20 09:14:35 pve3 kernel: FS: 0000000000000000(0000) GS:ffff9adbff480000(0000) knlGS:0000000000000000
Jul 20 09:14:35 pve3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 20 09:14:35 pve3 kernel: CR2: 000055bc5b25440f CR3: 0000000ec8210000 CR4: 0000000000752ee0
Jul 20 09:14:35 pve3 kernel: PKRU: 55555554
Jul 20 09:14:35 pve3 kernel: Call Trace:
Jul 20 09:14:35 pve3 kernel: <TASK>
Jul 20 09:14:35 pve3 kernel: ? finish_task_switch.isra.0+0x7e/0x2b0
Jul 20 09:14:35 pve3 kernel: sync_rcu_exp_select_node_cpus+0x23d/0x340
Jul 20 09:14:35 pve3 kernel: sync_rcu_exp_select_cpus+0x1e9/0x480
Jul 20 09:14:35 pve3 kernel: wait_rcu_exp_gp+0x14/0x30
Jul 20 09:14:35 pve3 kernel: process_one_work+0x228/0x3d0
Jul 20 09:14:35 pve3 kernel: worker_thread+0x53/0x420
Jul 20 09:14:35 pve3 kernel: ? process_one_work+0x3d0/0x3d0
Jul 20 09:14:35 pve3 kernel: kthread+0x127/0x150
Jul 20 09:14:35 pve3 kernel: ? set_kthread_struct+0x50/0x50
Jul 20 09:14:35 pve3 kernel: ret_from_fork+0x1f/0x30
Jul 20 09:14:35 pve3 kernel: </TASK>
Jul 20 09:14:39 pve3 systemd[1]: systemd-udevd.service: Processes still around after final SIGKILL. Entering failed mode.
Jul 20 09:14:39 pve3 systemd[1]: systemd-udevd.service: Failed with result 'watchdog'.
Jul 20 09:14:39 pve3 systemd[1]: systemd-udevd.service: Unit process 727 (systemd-udevd) remains running after unit stopped.
Jul 20 09:14:39 pve3 systemd[1]: systemd-udevd.service: Consumed 1.333s CPU time.
Jul 20 09:14:39 pve3 systemd[1]: systemd-udevd.service: Scheduled restart job, restart counter is at 1.
Jul 20 09:14:39 pve3 systemd[1]: Stopped Rule-based Manager for Device Events and Files.
Jul 20 09:14:39 pve3 systemd[1]: systemd-udevd.service: Consumed 1.333s CPU time.
Jul 20 09:14:39 pve3 systemd[1]: systemd-udevd.service: Found left-over process 727 (systemd-udevd) in control group while starting unit. Ignoring.
Jul 20 09:14:39 pve3 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 20 09:14:47 pve3 kernel: watchdog: BUG: soft lockup - CPU#6 stuck for 496s! [CPU 6/KVM:57024]
-- Reboot --
 

Attachments

  • syslog.txt
    37.5 KB · Views: 0

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!