Hello,
i've installed pbs 3-0-4 bare metal. In differnt periods the whole pbs machine freezes. In the Sylog i found the output:
After both events I can't ssh, or ping or use the webgui. The onlyway to get into it, is a hard reset.
What do you suggest? Software or hardware fail?
i've installed pbs 3-0-4 bare metal. In differnt periods the whole pbs machine freezes. In the Sylog i found the output:
Code:
Nov 24 23:06:06 pbs kernel: BUG: unable to handle page fault for address: 00002ea7c3e2a4d8
Nov 24 23:06:06 pbs kernel: #PF: supervisor write access in kernel mode
Nov 24 23:06:07 pbs kernel: #PF: error_code(0x0002) - not-present page
Nov 24 23:06:07 pbs kernel: PGD 0 P4D 0
Nov 24 23:06:07 pbs kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Nov 24 23:06:07 pbs kernel: CPU: 3 PID: 1278337 Comm: tokio-runtime-w Tainted: P O 6.2.16-19-pve #1
Nov 24 23:06:07 pbs kernel: Hardware name: retsamarret 000-F4423-EU000-2000-N/Default string, BIOS 5.19 01/09/2023
Nov 24 23:06:07 pbs kernel: RIP: 0010:charge_memcg+0xb0/0x100
Nov 24 23:06:07 pbs kernel: Code: 24 48 c1 ee 36 e8 30 98 ff ff fb 0f 1f 44 00 00 31 c0 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc be 01 00 <00> 00 48 89 df 41 f7 de e8 c3 d3 ff ff eb af ba 01 00 00 00 41 bd
Nov 24 23:06:07 pbs kernel: RSP: 0000:ffff9cbcc966fd20 EFLAGS: 00010082
Nov 24 23:06:07 pbs kernel: RAX: 00002ea7c3e2a4d8 RBX: ffff8e1486586000 RCX: 0000000000000000
Nov 24 23:06:07 pbs kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8e1486586000
Nov 24 23:06:07 pbs kernel: RBP: ffff9cbcc966fd60 R08: 0000000000000000 R09: 0000000000000000
Nov 24 23:06:07 pbs kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Nov 24 23:06:07 pbs kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff8e148ac99a70
Nov 24 23:06:07 pbs kernel: FS: 00007f539ae6e6c0(0000) GS:ffff8e14fbf80000(0000) knlGS:0000000000000000
Nov 24 23:06:07 pbs kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 24 23:06:07 pbs kernel: CR2: 00002ea7c3e2a4d8 CR3: 0000000107c8c000 CR4: 0000000000350ee0
Nov 24 23:06:07 pbs kernel: Call Trace:
Nov 24 23:06:07 pbs kernel: <TASK>
Nov 24 23:06:07 pbs kernel: ? show_regs+0x6d/0x80
Nov 24 23:06:07 pbs kernel: ? __die+0x24/0x80
Nov 24 23:06:07 pbs kernel: ? page_fault_oops+0x176/0x500
Nov 24 23:06:07 pbs kernel: ? do_user_addr_fault+0x2f3/0x620
Nov 24 23:06:07 pbs kernel: ? exc_page_fault+0x80/0x1b0
Nov 24 23:06:07 pbs kernel: ? asm_exc_page_fault+0x27/0x30
Nov 24 23:06:07 pbs kernel: ? charge_memcg+0xb0/0x100
Nov 24 23:06:07 pbs kernel: charge_memcg+0x90/0x100
Nov 24 23:06:07 pbs kernel: __mem_cgroup_charge+0x2d/0xa0
Nov 24 23:06:07 pbs kernel: __handle_mm_fault+0x9f6/0x1070
Nov 24 23:06:07 pbs kernel: handle_mm_fault+0x119/0x330
Nov 24 23:06:07 pbs kernel: ? lock_mm_and_find_vma+0x43/0x230
Nov 24 23:06:07 pbs kernel: do_user_addr_fault+0x194/0x620
Nov 24 23:06:07 pbs kernel: exc_page_fault+0x80/0x1b0
Nov 24 23:06:07 pbs kernel: asm_exc_page_fault+0x27/0x30
Nov 24 23:06:07 pbs kernel: RIP: 0033:0x7f53ba1b2f40
Nov 24 23:06:07 pbs kernel: Code: ae 10 10 00 00 0f 10 b6 20 10 00 00 0f 10 be 30 10 00 00 48 83 ee c0 66 0f e7 07 66 0f e7 4f 10 66 0f e7 57 20 66 0f e7 5f 30 <66> 0f e7 a7 00 10 00 00 66 0f e7 af 10 10 00 00 66 0f e7 b7 20 10
Nov 24 23:06:07 pbs kernel: RSP: 002b:00007f539ae63dd8 EFLAGS: 00010203
Nov 24 23:06:07 pbs kernel: RAX: 00007f539a4df010 RBX: 0000000000000001 RCX: 0000000000000040
Nov 24 23:06:07 pbs kernel: RDX: 0000000000000e8b RSI: 00007f539bc4c08c RDI: 00007f539a92d040
Nov 24 23:06:07 pbs kernel: RBP: 00007f539ae6bf60 R08: ffffffffffffffd0 R09: 0000000000000000
Nov 24 23:06:07 pbs kernel: R10: 00000000000001a0 R11: 0000000001000000 R12: 000000000078eebb
Nov 24 23:06:07 pbs kernel: R13: 00007f539b7fe01c R14: 00007f5364010770 R15: 00007f539a4df010
Nov 24 23:06:07 pbs kernel: </TASK>
Nov 24 23:06:07 pbs kernel: Modules linked in: bonding tls sunrpc binfmt_misc snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd intel_cstate i915 snd_sof_pci_intel_icl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine drm_buddy ttm snd_hda_intel drm_display_helper snd_intel_dspcfg intel_rapl_msr snd_intel_sdw_acpi snd_hda_codec processor_thermal_device_pci_legacy snd_hda_core processor_thermal_device cec snd_hwdep rc_core processor_thermal_rfim cmdlinepart snd_pcm drm_kms_helper processor_thermal_mbox spi_nor snd_timer pcspkr i2c_algo_bit wmi_bmof mtd processor_thermal_rapl snd ee1004 mei_me soundcore syscopyarea 8250_dw mei
Nov 24 23:06:07 pbs kernel: intel_rapl_common sysfillrect sysimgblt int340x_thermal_zone intel_soc_dts_iosf joydev input_leds acpi_pad acpi_tad mac_hid drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c simplefb hid_logitech_hidpp hid_logitech_dj hid_generic usbkbd usbmouse usbhid hid spi_pxa2xx_platform dw_dmac dw_dmac_core i2c_i801 spi_intel_pci xhci_pci nvme xhci_pci_renesas nvme_core crc32_pclmul nvme_common intel_lpss_pci spi_intel igc xhci_hcd i2c_smbus ahci sdhci_pci intel_lpss cqhci libahci sdhci idma64 video wmi pinctrl_jasperlake
Nov 24 23:06:07 pbs kernel: CR2: 00002ea7c3e2a4d8
Nov 24 23:06:07 pbs kernel: ---[ end trace 0000000000000000 ]---
Nov 24 23:06:07 pbs kernel: RIP: 0010:charge_memcg+0xb0/0x100
Nov 24 23:06:07 pbs kernel: Code: 24 48 c1 ee 36 e8 30 98 ff ff fb 0f 1f 44 00 00 31 c0 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc be 01 00 <00> 00 48 89 df 41 f7 de e8 c3 d3 ff ff eb af ba 01 00 00 00 41 bd
Nov 24 23:06:07 pbs kernel: RSP: 0000:ffff9cbcc966fd20 EFLAGS: 00010082
Nov 24 23:06:07 pbs kernel: RAX: 00002ea7c3e2a4d8 RBX: ffff8e1486586000 RCX: 0000000000000000
Nov 24 23:06:07 pbs kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8e1486586000
Nov 24 23:06:07 pbs kernel: RBP: ffff9cbcc966fd60 R08: 0000000000000000 R09: 0000000000000000
Nov 24 23:06:07 pbs kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Nov 24 23:06:07 pbs kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff8e148ac99a70
Nov 24 23:06:07 pbs kernel: FS: 00007f539ae6e6c0(0000) GS:ffff8e14fbf80000(0000) knlGS:0000000000000000
Nov 24 23:06:07 pbs kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 24 23:06:07 pbs kernel: CR2: 00002ea7c3e2a4d8 CR3: 0000000107c8c000 CR4: 0000000000350ee0
Nov 24 23:06:07 pbs kernel: note: tokio-runtime-w[1278337] exited with irqs disabled
Nov 24 23:10:42 pbs sshd[1291587]: Accepted password for XXXXXX from 10.10.1.98 port 55777 ssh2
Nov 24 23:10:42 pbs sshd[1291587]: pam_unix(sshd:session): session opened for user aferlemann(uid=1000) by (uid=0)
Nov 24 23:10:42 pbs systemd-logind[1425]: New session 78 of user aferlemann.
Nov 24 23:10:42 pbs systemd[1]: Created slice user-1000.slice - User Slice of UID 1000.
Nov 24 23:10:42 pbs systemd[1]: Starting user-runtime-dir@1000.service - User Runtime Directory /run/user/1000...
Nov 24 23:10:42 pbs systemd[1]: Finished user-runtime-dir@1000.service - User Runtime Directory /run/user/1000.
Nov 24 23:10:42 pbs systemd[1]: Starting user@1000.service - User Manager for UID 1000...
Nov 24 23:10:42 pbs (systemd)[1291618]: pam_unix(systemd-user:session): session opened for user aferlemann(uid=1000) by (uid=0)
Nov 24 23:10:43 pbs systemd[1291618]: Queued start job for default target default.target.
Nov 24 23:10:43 pbs systemd[1291618]: Created slice app.slice - User Application Slice.
Nov 24 23:10:43 pbs systemd[1291618]: Reached target paths.target - Paths.
Nov 24 23:10:43 pbs systemd[1291618]: Reached target timers.target - Timers.
Nov 24 23:10:43 pbs systemd[1291618]: Listening on dirmngr.socket - GnuPG network certificate management daemon.
Nov 24 23:10:43 pbs systemd[1291618]: Listening on gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov 24 23:10:43 pbs systemd[1291618]: Listening on gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Nov 24 23:10:43 pbs systemd[1291618]: Listening on gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Nov 24 23:10:43 pbs systemd[1291618]: Listening on gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Nov 24 23:10:43 pbs systemd[1291618]: Reached target sockets.target - Sockets.
Nov 24 23:10:43 pbs systemd[1291618]: Reached target basic.target - Basic System.
Nov 24 23:10:43 pbs systemd[1291618]: Reached target default.target - Main User Target.
Nov 24 23:10:43 pbs systemd[1291618]: Startup finished in 284ms.
Code:
Nov 25 04:44:47 pbs kernel: BUG: unable to handle page fault for address: ffffffff70980902
Nov 25 04:44:47 pbs kernel: #PF: supervisor write access in kernel mode
Nov 25 04:44:47 pbs kernel: #PF: error_code(0x0002) - not-present page
Nov 25 04:44:47 pbs kernel: PGD 17ae15067 P4D 17ae15067 PUD 0
Nov 25 04:44:47 pbs kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Nov 25 04:44:47 pbs kernel: CPU: 3 PID: 614636 Comm: z_wr_int Tainted: P O 6.2.16-19-pve #1
Nov 25 04:44:47 pbs kernel: Hardware name: retsamarret 000-F4423-EU000-2000-N/Default string, BIOS 5.19 01/09/2023
Nov 25 04:44:47 pbs kernel: RIP: 0010:raw_spin_rq_unlock+0x1c/0x40
Nov 25 04:44:47 pbs kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 66 90 e8 20 62 f6 00 5d 31 c0 31 ff c3 cc cc cc cc 8b 87 <18> 0d 00 00 85 c0 74 e7 48 8b bf 08 0d 00 00 e8 00 62 f6 00 5d 31
Nov 25 04:44:47 pbs kernel: RSP: 0018:ffffa954c5dbbce8 EFLAGS: 00010246
Nov 25 04:44:47 pbs kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Nov 25 04:44:47 pbs kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Nov 25 04:44:47 pbs kernel: RBP: ffffa954c5dbbd10 R08: 0000000000000000 R09: 0000000000000000
Nov 25 04:44:47 pbs kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9e45801b2040
Nov 25 04:44:47 pbs kernel: R13: ffff9e4500b7c980 R14: 0000000000000000 R15: 0000000000000000
Nov 25 04:44:47 pbs kernel: FS: 0000000000000000(0000) GS:ffff9e4580180000(0000) knlGS:0000000000000000
Nov 25 04:44:47 pbs kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 25 04:44:47 pbs kernel: CR2: ffffffff70980902 CR3: 000000010ad4a000 CR4: 0000000000350ee0
Nov 25 04:44:47 pbs kernel: Call Trace:
Nov 25 04:44:47 pbs kernel: <TASK>
Nov 25 04:44:47 pbs kernel: ? show_regs+0x6d/0x80
Nov 25 04:44:47 pbs kernel: ? __die+0x24/0x80
Nov 25 04:44:47 pbs kernel: ? page_fault_oops+0x176/0x500
Nov 25 04:44:47 pbs kernel: ? raw_spin_rq_unlock+0x1c/0x40
Nov 25 04:44:47 pbs kernel: ? kernelmode_fixup_or_oops+0xb2/0x140
Nov 25 04:44:47 pbs kernel: ? __bad_area_nosemaphore+0x1a5/0x2c0
Nov 25 04:44:47 pbs kernel: ? update_load_avg+0x82/0x810
Nov 25 04:44:47 pbs kernel: ? bad_area_nosemaphore+0x16/0x30
Nov 25 04:44:47 pbs kernel: ? do_kern_addr_fault+0x7b/0xa0
Nov 25 04:44:47 pbs kernel: ? exc_page_fault+0x10a/0x1b0
Nov 25 04:44:47 pbs kernel: ? asm_exc_page_fault+0x27/0x30
Nov 25 04:44:47 pbs kernel: ? raw_spin_rq_unlock+0x1c/0x40
Nov 25 04:44:47 pbs kernel: ? finish_task_switch.isra.0+0x85/0x2c0
Nov 25 04:44:47 pbs kernel: __schedule+0x40a/0x1510
Nov 25 04:44:47 pbs kernel: ? __wake_up_common_lock+0x8b/0xd0
Nov 25 04:44:47 pbs kernel: schedule+0x63/0x110
Nov 25 04:44:47 pbs kernel: taskq_thread+0x401/0x4d0 [spl]
Nov 25 04:44:47 pbs kernel: ? __pfx_default_wake_function+0x10/0x10
Nov 25 04:44:47 pbs kernel: ? __pfx_zio_execute+0x10/0x10 [zfs]
Nov 25 04:44:47 pbs kernel: ? __pfx_taskq_thread+0x10/0x10 [spl]
Nov 25 04:44:47 pbs kernel: kthread+0xe6/0x110
Nov 25 04:44:47 pbs kernel: ? __pfx_kthread+0x10/0x10
Nov 25 04:44:47 pbs kernel: ret_from_fork+0x29/0x50
Nov 25 04:44:47 pbs kernel: </TASK>
Nov 25 04:44:47 pbs kernel: Modules linked in: bonding tls sunrpc binfmt_misc x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi kvm_intel snd_sof_pci_intel_icl snd_sof_intel_hda_common kvm soundwire_intel soundwire_generic_allocation irqbypass soundwire_cadence crct10dif_pclmul snd_sof_intel_hda polyval_generic ghash_clmulni_intel snd_sof_pci sha512_ssse3 aesni_intel snd_sof_xtensa_dsp crypto_simd cryptd i915 snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core snd_compress intel_cstate ac97_bus drm_buddy ttm snd_pcm_dmaengine drm_display_helper snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr cec snd_hda_core snd_hwdep snd_pcm rc_core snd_timer cmdlinepart processor_thermal_device_pci_legacy processor_thermal_device processor_thermal_rfim spi_nor drm_kms_helper snd processor_thermal_mbox i2c_algo_bit mtd processor_thermal_rapl pcspkr 8250_dw soundcore ee1004 wmi_bmof intel_rapl_common
Nov 25 04:44:47 pbs kernel: mei_me int340x_thermal_zone syscopyarea sysfillrect mei intel_soc_dts_iosf sysimgblt joydev acpi_pad input_leds acpi_tad mac_hid drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c simplefb hid_logitech_hidpp hid_logitech_dj hid_generic usbkbd usbmouse spi_pxa2xx_platform usbhid dw_dmac hid dw_dmac_core nvme sdhci_pci crc32_pclmul i2c_i801 i2c_smbus spi_intel_pci ahci intel_lpss_pci spi_intel cqhci nvme_core libahci sdhci xhci_pci xhci_pci_renesas nvme_common xhci_hcd igc intel_lpss idma64 video wmi pinctrl_jasperlake
Nov 25 04:44:47 pbs kernel: CR2: ffffffff70980902
Nov 25 04:44:47 pbs kernel: ---[ end trace 0000000000000000 ]---
Nov 25 04:44:47 pbs kernel: RIP: 0010:raw_spin_rq_unlock+0x1c/0x40
Nov 25 04:44:47 pbs kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 66 90 e8 20 62 f6 00 5d 31 c0 31 ff c3 cc cc cc cc 8b 87 <18> 0d 00 00 85 c0 74 e7 48 8b bf 08 0d 00 00 e8 00 62 f6 00 5d 31
Nov 25 04:44:47 pbs kernel: RSP: 0018:ffffa954c5dbbce8 EFLAGS: 00010246
Nov 25 04:44:47 pbs kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Nov 25 04:44:47 pbs kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Nov 25 04:44:47 pbs kernel: RBP: ffffa954c5dbbd10 R08: 0000000000000000 R09: 0000000000000000
Nov 25 04:44:47 pbs kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9e45801b2040
Nov 25 04:44:47 pbs kernel: R13: ffff9e4500b7c980 R14: 0000000000000000 R15: 0000000000000000
Nov 25 04:44:47 pbs kernel: FS: 0000000000000000(0000) GS:ffff9e4580180000(0000) knlGS:0000000000000000
Nov 25 04:44:47 pbs kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 25 04:44:47 pbs kernel: CR2: ffffffff70980902 CR3: 000000010ad4a000 CR4: 0000000000350ee0
Nov 25 04:44:47 pbs kernel: note: z_wr_int[614636] exited with irqs disabled
Nov 25 04:44:47 pbs kernel: BUG: unable to handle page fault for address: 0000000103844a00
Nov 25 04:44:47 pbs kernel: #PF: supervisor instruction fetch in kernel mode
Nov 25 04:44:47 pbs kernel: #PF: error_code(0x0010) - not-present page
After both events I can't ssh, or ping or use the webgui. The onlyway to get into it, is a hard reset.
What do you suggest? Software or hardware fail?
Last edited: