Kernel Error with PBS kernel: BUG: unable to handle page fault for address: 00002ea7c3e2a4d8

spencer85

Member
Nov 18, 2019
1
0
21
38
Hello,

i've installed pbs 3-0-4 bare metal. In differnt periods the whole pbs machine freezes. In the Sylog i found the output:

Code:
Nov 24 23:06:06 pbs kernel: BUG: unable to handle page fault for address: 00002ea7c3e2a4d8
Nov 24 23:06:06 pbs kernel: #PF: supervisor write access in kernel mode
Nov 24 23:06:07 pbs kernel: #PF: error_code(0x0002) - not-present page
Nov 24 23:06:07 pbs kernel: PGD 0 P4D 0
Nov 24 23:06:07 pbs kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Nov 24 23:06:07 pbs kernel: CPU: 3 PID: 1278337 Comm: tokio-runtime-w Tainted: P           O       6.2.16-19-pve #1
Nov 24 23:06:07 pbs kernel: Hardware name: retsamarret 000-F4423-EU000-2000-N/Default string, BIOS 5.19 01/09/2023
Nov 24 23:06:07 pbs kernel: RIP: 0010:charge_memcg+0xb0/0x100
Nov 24 23:06:07 pbs kernel: Code: 24 48 c1 ee 36 e8 30 98 ff ff fb 0f 1f 44 00 00 31 c0 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc be 01 00 <00> 00 48 89 df 41 f7 de e8 c3 d3 ff ff eb af ba 01 00 00 00 41 bd
Nov 24 23:06:07 pbs kernel: RSP: 0000:ffff9cbcc966fd20 EFLAGS: 00010082
Nov 24 23:06:07 pbs kernel: RAX: 00002ea7c3e2a4d8 RBX: ffff8e1486586000 RCX: 0000000000000000
Nov 24 23:06:07 pbs kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8e1486586000
Nov 24 23:06:07 pbs kernel: RBP: ffff9cbcc966fd60 R08: 0000000000000000 R09: 0000000000000000
Nov 24 23:06:07 pbs kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Nov 24 23:06:07 pbs kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff8e148ac99a70
Nov 24 23:06:07 pbs kernel: FS:  00007f539ae6e6c0(0000) GS:ffff8e14fbf80000(0000) knlGS:0000000000000000
Nov 24 23:06:07 pbs kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 24 23:06:07 pbs kernel: CR2: 00002ea7c3e2a4d8 CR3: 0000000107c8c000 CR4: 0000000000350ee0
Nov 24 23:06:07 pbs kernel: Call Trace:
Nov 24 23:06:07 pbs kernel:  <TASK>
Nov 24 23:06:07 pbs kernel:  ? show_regs+0x6d/0x80
Nov 24 23:06:07 pbs kernel:  ? __die+0x24/0x80
Nov 24 23:06:07 pbs kernel:  ? page_fault_oops+0x176/0x500
Nov 24 23:06:07 pbs kernel:  ? do_user_addr_fault+0x2f3/0x620
Nov 24 23:06:07 pbs kernel:  ? exc_page_fault+0x80/0x1b0
Nov 24 23:06:07 pbs kernel:  ? asm_exc_page_fault+0x27/0x30
Nov 24 23:06:07 pbs kernel:  ? charge_memcg+0xb0/0x100
Nov 24 23:06:07 pbs kernel:  charge_memcg+0x90/0x100
Nov 24 23:06:07 pbs kernel:  __mem_cgroup_charge+0x2d/0xa0
Nov 24 23:06:07 pbs kernel:  __handle_mm_fault+0x9f6/0x1070
Nov 24 23:06:07 pbs kernel:  handle_mm_fault+0x119/0x330
Nov 24 23:06:07 pbs kernel:  ? lock_mm_and_find_vma+0x43/0x230
Nov 24 23:06:07 pbs kernel:  do_user_addr_fault+0x194/0x620
Nov 24 23:06:07 pbs kernel:  exc_page_fault+0x80/0x1b0
Nov 24 23:06:07 pbs kernel:  asm_exc_page_fault+0x27/0x30
Nov 24 23:06:07 pbs kernel: RIP: 0033:0x7f53ba1b2f40
Nov 24 23:06:07 pbs kernel: Code: ae 10 10 00 00 0f 10 b6 20 10 00 00 0f 10 be 30 10 00 00 48 83 ee c0 66 0f e7 07 66 0f e7 4f 10 66 0f e7 57 20 66 0f e7 5f 30 <66> 0f e7 a7 00 10 00 00 66 0f e7 af 10 10 00 00 66 0f e7 b7 20 10
Nov 24 23:06:07 pbs kernel: RSP: 002b:00007f539ae63dd8 EFLAGS: 00010203
Nov 24 23:06:07 pbs kernel: RAX: 00007f539a4df010 RBX: 0000000000000001 RCX: 0000000000000040
Nov 24 23:06:07 pbs kernel: RDX: 0000000000000e8b RSI: 00007f539bc4c08c RDI: 00007f539a92d040
Nov 24 23:06:07 pbs kernel: RBP: 00007f539ae6bf60 R08: ffffffffffffffd0 R09: 0000000000000000
Nov 24 23:06:07 pbs kernel: R10: 00000000000001a0 R11: 0000000001000000 R12: 000000000078eebb
Nov 24 23:06:07 pbs kernel: R13: 00007f539b7fe01c R14: 00007f5364010770 R15: 00007f539a4df010
Nov 24 23:06:07 pbs kernel:  </TASK>
Nov 24 23:06:07 pbs kernel: Modules linked in: bonding tls sunrpc binfmt_misc snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd intel_cstate i915 snd_sof_pci_intel_icl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine drm_buddy ttm snd_hda_intel drm_display_helper snd_intel_dspcfg intel_rapl_msr snd_intel_sdw_acpi snd_hda_codec processor_thermal_device_pci_legacy snd_hda_core processor_thermal_device cec snd_hwdep rc_core processor_thermal_rfim cmdlinepart snd_pcm drm_kms_helper processor_thermal_mbox spi_nor snd_timer pcspkr i2c_algo_bit wmi_bmof mtd processor_thermal_rapl snd ee1004 mei_me soundcore syscopyarea 8250_dw mei
Nov 24 23:06:07 pbs kernel:  intel_rapl_common sysfillrect sysimgblt int340x_thermal_zone intel_soc_dts_iosf joydev input_leds acpi_pad acpi_tad mac_hid drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c simplefb hid_logitech_hidpp hid_logitech_dj hid_generic usbkbd usbmouse usbhid hid spi_pxa2xx_platform dw_dmac dw_dmac_core i2c_i801 spi_intel_pci xhci_pci nvme xhci_pci_renesas nvme_core crc32_pclmul nvme_common intel_lpss_pci spi_intel igc xhci_hcd i2c_smbus ahci sdhci_pci intel_lpss cqhci libahci sdhci idma64 video wmi pinctrl_jasperlake
Nov 24 23:06:07 pbs kernel: CR2: 00002ea7c3e2a4d8
Nov 24 23:06:07 pbs kernel: ---[ end trace 0000000000000000 ]---
Nov 24 23:06:07 pbs kernel: RIP: 0010:charge_memcg+0xb0/0x100
Nov 24 23:06:07 pbs kernel: Code: 24 48 c1 ee 36 e8 30 98 ff ff fb 0f 1f 44 00 00 31 c0 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc be 01 00 <00> 00 48 89 df 41 f7 de e8 c3 d3 ff ff eb af ba 01 00 00 00 41 bd
Nov 24 23:06:07 pbs kernel: RSP: 0000:ffff9cbcc966fd20 EFLAGS: 00010082
Nov 24 23:06:07 pbs kernel: RAX: 00002ea7c3e2a4d8 RBX: ffff8e1486586000 RCX: 0000000000000000
Nov 24 23:06:07 pbs kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8e1486586000
Nov 24 23:06:07 pbs kernel: RBP: ffff9cbcc966fd60 R08: 0000000000000000 R09: 0000000000000000
Nov 24 23:06:07 pbs kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Nov 24 23:06:07 pbs kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff8e148ac99a70
Nov 24 23:06:07 pbs kernel: FS:  00007f539ae6e6c0(0000) GS:ffff8e14fbf80000(0000) knlGS:0000000000000000
Nov 24 23:06:07 pbs kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 24 23:06:07 pbs kernel: CR2: 00002ea7c3e2a4d8 CR3: 0000000107c8c000 CR4: 0000000000350ee0
Nov 24 23:06:07 pbs kernel: note: tokio-runtime-w[1278337] exited with irqs disabled
Nov 24 23:10:42 pbs sshd[1291587]: Accepted password for XXXXXX from 10.10.1.98 port 55777 ssh2
Nov 24 23:10:42 pbs sshd[1291587]: pam_unix(sshd:session): session opened for user aferlemann(uid=1000) by (uid=0)
Nov 24 23:10:42 pbs systemd-logind[1425]: New session 78 of user aferlemann.
Nov 24 23:10:42 pbs systemd[1]: Created slice user-1000.slice - User Slice of UID 1000.
Nov 24 23:10:42 pbs systemd[1]: Starting user-runtime-dir@1000.service - User Runtime Directory /run/user/1000...
Nov 24 23:10:42 pbs systemd[1]: Finished user-runtime-dir@1000.service - User Runtime Directory /run/user/1000.
Nov 24 23:10:42 pbs systemd[1]: Starting user@1000.service - User Manager for UID 1000...
Nov 24 23:10:42 pbs (systemd)[1291618]: pam_unix(systemd-user:session): session opened for user aferlemann(uid=1000) by (uid=0)
Nov 24 23:10:43 pbs systemd[1291618]: Queued start job for default target default.target.
Nov 24 23:10:43 pbs systemd[1291618]: Created slice app.slice - User Application Slice.
Nov 24 23:10:43 pbs systemd[1291618]: Reached target paths.target - Paths.
Nov 24 23:10:43 pbs systemd[1291618]: Reached target timers.target - Timers.
Nov 24 23:10:43 pbs systemd[1291618]: Listening on dirmngr.socket - GnuPG network certificate management daemon.
Nov 24 23:10:43 pbs systemd[1291618]: Listening on gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov 24 23:10:43 pbs systemd[1291618]: Listening on gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Nov 24 23:10:43 pbs systemd[1291618]: Listening on gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Nov 24 23:10:43 pbs systemd[1291618]: Listening on gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Nov 24 23:10:43 pbs systemd[1291618]: Reached target sockets.target - Sockets.
Nov 24 23:10:43 pbs systemd[1291618]: Reached target basic.target - Basic System.
Nov 24 23:10:43 pbs systemd[1291618]: Reached target default.target - Main User Target.
Nov 24 23:10:43 pbs systemd[1291618]: Startup finished in 284ms.

Code:
Nov 25 04:44:47 pbs kernel: BUG: unable to handle page fault for address: ffffffff70980902
Nov 25 04:44:47 pbs kernel: #PF: supervisor write access in kernel mode
Nov 25 04:44:47 pbs kernel: #PF: error_code(0x0002) - not-present page
Nov 25 04:44:47 pbs kernel: PGD 17ae15067 P4D 17ae15067 PUD 0
Nov 25 04:44:47 pbs kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Nov 25 04:44:47 pbs kernel: CPU: 3 PID: 614636 Comm: z_wr_int Tainted: P           O       6.2.16-19-pve #1
Nov 25 04:44:47 pbs kernel: Hardware name: retsamarret 000-F4423-EU000-2000-N/Default string, BIOS 5.19 01/09/2023
Nov 25 04:44:47 pbs kernel: RIP: 0010:raw_spin_rq_unlock+0x1c/0x40
Nov 25 04:44:47 pbs kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 66 90 e8 20 62 f6 00 5d 31 c0 31 ff c3 cc cc cc cc 8b 87 <18> 0d 00 00 85 c0 74 e7 48 8b bf 08 0d 00 00 e8 00 62 f6 00 5d 31
Nov 25 04:44:47 pbs kernel: RSP: 0018:ffffa954c5dbbce8 EFLAGS: 00010246
Nov 25 04:44:47 pbs kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Nov 25 04:44:47 pbs kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Nov 25 04:44:47 pbs kernel: RBP: ffffa954c5dbbd10 R08: 0000000000000000 R09: 0000000000000000
Nov 25 04:44:47 pbs kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9e45801b2040
Nov 25 04:44:47 pbs kernel: R13: ffff9e4500b7c980 R14: 0000000000000000 R15: 0000000000000000
Nov 25 04:44:47 pbs kernel: FS:  0000000000000000(0000) GS:ffff9e4580180000(0000) knlGS:0000000000000000
Nov 25 04:44:47 pbs kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 25 04:44:47 pbs kernel: CR2: ffffffff70980902 CR3: 000000010ad4a000 CR4: 0000000000350ee0
Nov 25 04:44:47 pbs kernel: Call Trace:
Nov 25 04:44:47 pbs kernel:  <TASK>
Nov 25 04:44:47 pbs kernel:  ? show_regs+0x6d/0x80
Nov 25 04:44:47 pbs kernel:  ? __die+0x24/0x80
Nov 25 04:44:47 pbs kernel:  ? page_fault_oops+0x176/0x500
Nov 25 04:44:47 pbs kernel:  ? raw_spin_rq_unlock+0x1c/0x40
Nov 25 04:44:47 pbs kernel:  ? kernelmode_fixup_or_oops+0xb2/0x140
Nov 25 04:44:47 pbs kernel:  ? __bad_area_nosemaphore+0x1a5/0x2c0
Nov 25 04:44:47 pbs kernel:  ? update_load_avg+0x82/0x810
Nov 25 04:44:47 pbs kernel:  ? bad_area_nosemaphore+0x16/0x30
Nov 25 04:44:47 pbs kernel:  ? do_kern_addr_fault+0x7b/0xa0
Nov 25 04:44:47 pbs kernel:  ? exc_page_fault+0x10a/0x1b0
Nov 25 04:44:47 pbs kernel:  ? asm_exc_page_fault+0x27/0x30
Nov 25 04:44:47 pbs kernel:  ? raw_spin_rq_unlock+0x1c/0x40
Nov 25 04:44:47 pbs kernel:  ? finish_task_switch.isra.0+0x85/0x2c0
Nov 25 04:44:47 pbs kernel:  __schedule+0x40a/0x1510
Nov 25 04:44:47 pbs kernel:  ? __wake_up_common_lock+0x8b/0xd0
Nov 25 04:44:47 pbs kernel:  schedule+0x63/0x110
Nov 25 04:44:47 pbs kernel:  taskq_thread+0x401/0x4d0 [spl]
Nov 25 04:44:47 pbs kernel:  ? __pfx_default_wake_function+0x10/0x10
Nov 25 04:44:47 pbs kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
Nov 25 04:44:47 pbs kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
Nov 25 04:44:47 pbs kernel:  kthread+0xe6/0x110
Nov 25 04:44:47 pbs kernel:  ? __pfx_kthread+0x10/0x10
Nov 25 04:44:47 pbs kernel:  ret_from_fork+0x29/0x50
Nov 25 04:44:47 pbs kernel:  </TASK>
Nov 25 04:44:47 pbs kernel: Modules linked in: bonding tls sunrpc binfmt_misc x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi kvm_intel snd_sof_pci_intel_icl snd_sof_intel_hda_common kvm soundwire_intel soundwire_generic_allocation irqbypass soundwire_cadence crct10dif_pclmul snd_sof_intel_hda polyval_generic ghash_clmulni_intel snd_sof_pci sha512_ssse3 aesni_intel snd_sof_xtensa_dsp crypto_simd cryptd i915 snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core snd_compress intel_cstate ac97_bus drm_buddy ttm snd_pcm_dmaengine drm_display_helper snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr cec snd_hda_core snd_hwdep snd_pcm rc_core snd_timer cmdlinepart processor_thermal_device_pci_legacy processor_thermal_device processor_thermal_rfim spi_nor drm_kms_helper snd processor_thermal_mbox i2c_algo_bit mtd processor_thermal_rapl pcspkr 8250_dw soundcore ee1004 wmi_bmof intel_rapl_common
Nov 25 04:44:47 pbs kernel:  mei_me int340x_thermal_zone syscopyarea sysfillrect mei intel_soc_dts_iosf sysimgblt joydev acpi_pad input_leds acpi_tad mac_hid drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c simplefb hid_logitech_hidpp hid_logitech_dj hid_generic usbkbd usbmouse spi_pxa2xx_platform usbhid dw_dmac hid dw_dmac_core nvme sdhci_pci crc32_pclmul i2c_i801 i2c_smbus spi_intel_pci ahci intel_lpss_pci spi_intel cqhci nvme_core libahci sdhci xhci_pci xhci_pci_renesas nvme_common xhci_hcd igc intel_lpss idma64 video wmi pinctrl_jasperlake
Nov 25 04:44:47 pbs kernel: CR2: ffffffff70980902
Nov 25 04:44:47 pbs kernel: ---[ end trace 0000000000000000 ]---
Nov 25 04:44:47 pbs kernel: RIP: 0010:raw_spin_rq_unlock+0x1c/0x40
Nov 25 04:44:47 pbs kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 66 90 e8 20 62 f6 00 5d 31 c0 31 ff c3 cc cc cc cc 8b 87 <18> 0d 00 00 85 c0 74 e7 48 8b bf 08 0d 00 00 e8 00 62 f6 00 5d 31
Nov 25 04:44:47 pbs kernel: RSP: 0018:ffffa954c5dbbce8 EFLAGS: 00010246
Nov 25 04:44:47 pbs kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Nov 25 04:44:47 pbs kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Nov 25 04:44:47 pbs kernel: RBP: ffffa954c5dbbd10 R08: 0000000000000000 R09: 0000000000000000
Nov 25 04:44:47 pbs kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9e45801b2040
Nov 25 04:44:47 pbs kernel: R13: ffff9e4500b7c980 R14: 0000000000000000 R15: 0000000000000000
Nov 25 04:44:47 pbs kernel: FS:  0000000000000000(0000) GS:ffff9e4580180000(0000) knlGS:0000000000000000
Nov 25 04:44:47 pbs kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 25 04:44:47 pbs kernel: CR2: ffffffff70980902 CR3: 000000010ad4a000 CR4: 0000000000350ee0
Nov 25 04:44:47 pbs kernel: note: z_wr_int[614636] exited with irqs disabled
Nov 25 04:44:47 pbs kernel: BUG: unable to handle page fault for address: 0000000103844a00
Nov 25 04:44:47 pbs kernel: #PF: supervisor instruction fetch in kernel mode
Nov 25 04:44:47 pbs kernel: #PF: error_code(0x0010) - not-present page

After both events I can't ssh, or ping or use the webgui. The onlyway to get into it, is a hard reset.
What do you suggest? Software or hardware fail?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!