I had some power outages today at home and after that my proxmox machine boots but after a while gets a pagefault and lxcs stop working.
I had an UPS but it seems to not have helped with that.
I did a memtest and it went OK.
Tried to boot with a previous kernel and the pagefault was still happening.
I have zfs pool as mirror of two nvme drives. I did a scrub on the zfs pools and it didn't found any error.
SMART looks good also.
I'm thinking maybe the CPU or motherboard got damaged? Or maybe something on the filesystem got corrupted and a reinstall will help?
Tomorrow I will do an stress test on cpu booting from a usb to see if it also crashes.
I am not an expert and will appreciate any help or guidance on how to diagnose what is happening.
Here is the pagefault:
I had an UPS but it seems to not have helped with that.
I did a memtest and it went OK.
Tried to boot with a previous kernel and the pagefault was still happening.
I have zfs pool as mirror of two nvme drives. I did a scrub on the zfs pools and it didn't found any error.
SMART looks good also.
I'm thinking maybe the CPU or motherboard got damaged? Or maybe something on the filesystem got corrupted and a reinstall will help?
Tomorrow I will do an stress test on cpu booting from a usb to see if it also crashes.
I am not an expert and will appreciate any help or guidance on how to diagnose what is happening.
Here is the pagefault:
Code:
Feb 15 01:03:29 proxmox pveproxy[12941]: got inotify poll request in wrong process - disabling inotify
Feb 15 01:08:31 proxmox systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Feb 15 01:08:31 proxmox systemd-tmpfiles[15967]: /usr/lib/tmpfiles.d/legacy.conf:14: Duplicate line for path "/run/lock", ignoring.
Feb 15 01:08:31 proxmox systemd-tmpfiles[15967]: /usr/lib/tmpfiles.d/nut-common-tmpfiles.conf:8: Duplicate line for path "/run/nut", ignoring.
Feb 15 01:08:31 proxmox systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Feb 15 01:08:31 proxmox systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Feb 15 01:09:02 proxmox pvedaemon[1909]: <root@pam> successful auth for user 'root@pam'
Feb 15 01:10:11 proxmox zed[16878]: eid=9 class=scrub_finish pool='rpool'
Feb 15 01:11:00 proxmox kernel: BUG: unable to handle page fault for address: 00000000b3a80000
Feb 15 01:11:00 proxmox kernel: #PF: supervisor write access in kernel mode
Feb 15 01:11:00 proxmox kernel: #PF: error_code(0x0002) - not-present page
Feb 15 01:11:00 proxmox kernel: PGD 0 P4D 0
Feb 15 01:11:00 proxmox kernel: Oops: Oops: 0002 [#1] SMP NOPTI
Feb 15 01:11:00 proxmox kernel: CPU: 15 UID: 0 PID: 335 Comm: kworker/u80:5 Tainted: P S O 6.17.9-1-pve #1 PREEMPT(voluntary)
Feb 15 01:11:00 proxmox kernel: Tainted: [P]=PROPRIETARY_MODULE, [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
Feb 15 01:11:00 proxmox kernel: Hardware name: To Be Filled By O.E.M. Z690 Pro RS/Z690 Pro RS, BIOS 9.02 06/06/2022
Feb 15 01:11:00 proxmox kernel: Workqueue: xprtiod xs_stream_data_receive_workfn [sunrpc]
Feb 15 01:11:00 proxmox kernel: RIP: 0010:__pfx_memcpy_orig+0x1/0x10
Feb 15 01:11:00 proxmox kernel: Code: cc cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 d1 f3 a4 c3 cc cc cc cc 90 90 <90> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20
Feb 15 01:11:00 proxmox kernel: RSP: 0018:ffffcdf54154f980 EFLAGS: 00010286
Feb 15 01:11:00 proxmox kernel: RAX: ffff8d00b3a80000 RBX: 0000000000006f94 RCX: 0000000000001000
Feb 15 01:11:00 proxmox kernel: RDX: 0000000000001000 RSI: ffff8cff24a1006c RDI: 00000000b3a80000
Feb 15 01:11:00 proxmox kernel: RBP: ffffcdf54154fa20 R08: ffff8cff24a1006c R09: 0000000000000000
Feb 15 01:11:00 proxmox kernel: R10: 0000000000000000 R11: ffff8cf78d6a0a00 R12: ffffcdf54154fd68
Feb 15 01:11:00 proxmox kernel: R13: ffff8d0060b41600 R14: 0000000000001000 R15: 0000000000001000
Feb 15 01:11:00 proxmox kernel: FS: 0000000000000000(0000) GS:ffff8d0274106000(0000) knlGS:0000000000000000
Feb 15 01:11:00 proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 15 01:11:00 proxmox kernel: CR2: 00000000b3a80000 CR3: 000000038803a000 CR4: 0000000000f52ef0
Feb 15 01:11:00 proxmox kernel: PKRU: 55555554
Feb 15 01:11:00 proxmox kernel: Call Trace:
Feb 15 01:11:00 proxmox kernel: <TASK>
Feb 15 01:11:00 proxmox kernel: ? _copy_to_iter+0x27f/0x610
Feb 15 01:11:00 proxmox kernel: ? __ip_queue_xmit+0x1ce/0x560
Feb 15 01:11:00 proxmox kernel: ? __check_object_size+0xb4/0x240
Feb 15 01:11:00 proxmox kernel: ? __pfx_simple_copy_to_iter+0x10/0x10
Feb 15 01:11:00 proxmox kernel: simple_copy_to_iter+0x3e/0x70
Feb 15 01:11:00 proxmox kernel: __skb_datagram_iter+0x1b8/0x2f0
Feb 15 01:11:00 proxmox kernel: ? __pfx_simple_copy_to_iter+0x10/0x10
Feb 15 01:11:00 proxmox kernel: skb_copy_datagram_iter+0x37/0xa0
Feb 15 01:11:00 proxmox kernel: tcp_recvmsg_locked+0x847/0xaf0
Feb 15 01:11:00 proxmox kernel: ? __tcp_send_ack.part.0+0xdc/0x1c0
Feb 15 01:11:00 proxmox kernel: tcp_recvmsg+0x83/0x210
Feb 15 01:11:00 proxmox kernel: inet_recvmsg+0x51/0x130
Feb 15 01:11:00 proxmox kernel: ? security_socket_recvmsg+0x44/0x80
Feb 15 01:11:00 proxmox kernel: sock_recvmsg+0xc6/0xf0
Feb 15 01:11:00 proxmox kernel: xs_sock_recvmsg.constprop.0+0x2c/0xa0 [sunrpc]
Feb 15 01:11:00 proxmox kernel: xs_read_stream_request.constprop.0+0x255/0x4f0 [sunrpc]
Feb 15 01:11:00 proxmox kernel: xs_read_stream.constprop.0+0x2b3/0x440 [sunrpc]
Feb 15 01:11:00 proxmox kernel: xs_stream_data_receive_workfn+0x71/0x150 [sunrpc]
Feb 15 01:11:00 proxmox kernel: process_one_work+0x188/0x370
Feb 15 01:11:00 proxmox kernel: worker_thread+0x33a/0x480
Feb 15 01:11:00 proxmox kernel: ? __pfx_worker_thread+0x10/0x10
Feb 15 01:11:00 proxmox kernel: kthread+0x108/0x220
Feb 15 01:11:00 proxmox kernel: ? __pfx_kthread+0x10/0x10
Feb 15 01:11:00 proxmox kernel: ret_from_fork+0x205/0x240
Feb 15 01:11:00 proxmox kernel: ? __pfx_kthread+0x10/0x10
Feb 15 01:11:00 proxmox kernel: ret_from_fork_asm+0x1a/0x30
Feb 15 01:11:00 proxmox kernel: </TASK>
Feb 15 01:11:00 proxmox kernel: Modules linked in: tcp_diag inet_diag act_police cls_basic sch_ingress sch_htb cfg80211 veth rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables bonding tls sunrpc binfmt_misc nfnetlink_log xe gpu_sched drm_gpuvm drm_gpusvm_helper drm_ttm_helper drm_exec drm_suballoc_helper snd_hda_codec_intelhdmi snd_hda_codec_alc662 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt intel_rapl_msr intel_rapl_common snd_sof_intel_hda_common intel_uncore_frequency snd_soc_hdac_hda intel_uncore_frequency_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_hda_codec_hdmi soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi
Feb 15 01:11:00 proxmox kernel: soundwire_bus snd_soc_sdca crc8 snd_soc_avs snd_soc_hda_codec snd_hda_ext_core x86_pkg_temp_thermal intel_powerclamp snd_hda_codec snd_hda_core snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel snd_hwdep i915 snd_soc_core kvm snd_compress ac97_bus snd_pcm_dmaengine drm_buddy ttm snd_pcm irqbypass polyval_clmulni snd_timer drm_display_helper cmdlinepart ghash_clmulni_intel aesni_intel snd mei_hdcp mei_pxp spi_nor cec rapl mtd ee1004 soundcore intel_cstate wmi_bmof pcspkr mei_me rc_core mei i2c_algo_bit intel_pmc_core pmt_telemetry pmt_discovery pmt_class input_leds intel_pmc_ssram_telemetry intel_vsec acpi_pad acpi_tad joydev mac_hid sch_fq_codel vhost_net vhost vhost_iotlb tap nct6775 nct6775_core hwmon_vid coretemp efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq hid_logitech_hidpp hid_logitech_dj hid_generic usbkbd usbmouse usbhid uas hid usb_storage nvme xhci_pci r8169 intel_lpss_pci nvme_core ahci i2c_i801 spi_intel_pci xhci_hcd intel_lpss i2c_mux
Feb 15 01:11:00 proxmox kernel: realtek nvme_keyring libahci spi_intel idma64 i2c_smbus nvme_auth video wmi
Feb 15 01:11:00 proxmox kernel: CR2: 00000000b3a80000
Feb 15 01:11:00 proxmox kernel: ---[ end trace 0000000000000000 ]---
Feb 15 01:11:00 proxmox kernel: RIP: 0010:__pfx_memcpy_orig+0x1/0x10
Feb 15 01:11:00 proxmox kernel: Code: cc cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 d1 f3 a4 c3 cc cc cc cc 90 90 <90> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20
Feb 15 01:11:00 proxmox kernel: RSP: 0018:ffffcdf54154f980 EFLAGS: 00010286
Feb 15 01:11:00 proxmox kernel: RAX: ffff8d00b3a80000 RBX: 0000000000006f94 RCX: 0000000000001000
Feb 15 01:11:00 proxmox kernel: RDX: 0000000000001000 RSI: ffff8cff24a1006c RDI: 00000000b3a80000
Feb 15 01:11:00 proxmox kernel: RBP: ffffcdf54154fa20 R08: ffff8cff24a1006c R09: 0000000000000000
Feb 15 01:11:00 proxmox kernel: R10: 0000000000000000 R11: ffff8cf78d6a0a00 R12: ffffcdf54154fd68
Feb 15 01:11:00 proxmox kernel: R13: ffff8d0060b41600 R14: 0000000000001000 R15: 0000000000001000
Feb 15 01:11:00 proxmox kernel: FS: 0000000000000000(0000) GS:ffff8d0274106000(0000) knlGS:0000000000000000
Feb 15 01:11:00 proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 15 01:11:00 proxmox kernel: CR2: 00000000b3a80000 CR3: 000000038803a000 CR4: 0000000000f52ef0
Feb 15 01:11:00 proxmox kernel: PKRU: 55555554
Feb 15 01:11:00 proxmox kernel: note: kworker/u80:5[335] exited with irqs disabled