[SOLVED] BUG: kernel NULL pointer dereference, address: 0000000000000000

Running into a similar problem.


Linux version 6.8.12-10-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-10 (2025-04-18T07:39Z) ()

The mother board is a server, updated to latest BIOS W680D4U-2L2T/G5/W680D4U-2L2T/G5, BIOS 22.01 10/01/2024
CPU i9-14900K
ECC memory fully tested

It randomly happens every 3-4 days and then Proxmox hangs up, so only solution is power cycle.

Any hints?

Thanks a lot in advance!


Code:
May 25 05:40:21 oa-nas kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
May 25 05:40:21 oa-nas kernel: #PF: supervisor write access in kernel mode
May 25 05:40:21 oa-nas kernel: #PF: error_code(0x0002) - not-present page
May 25 05:40:21 oa-nas kernel: PGD 0 P4D 0
May 25 05:40:21 oa-nas kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
May 25 05:40:21 oa-nas kernel: CPU: 4 PID: 2285716 Comm: z_wr_int_3 Tainted: P           O       6.8.12-10-pve #1
May 25 05:40:21 oa-nas kernel: Hardware name:  W680D4U-2L2T/G5/W680D4U-2L2T/G5, BIOS 22.01 10/01/2024
May 25 05:40:21 oa-nas kernel: RIP: 0010:add_wait_queue_exclusive+0x3b/0x60
May 25 05:40:21 oa-nas kernel: Code: fb 83 0e 01 e8 76 b5 ff 00 49 8d 54 24 18 48 8d 4b 08 48 89 df 48 89 c6 48 8b 43 10 48 89 53 10 49 89 4c 24 18 49 89 44 24 20 <48> 89 10 e8 4d b6 ff 00 5b 41 5c 5d 31 c0 31 d>
May 25 05:40:21 oa-nas kernel: RSP: 0018:ffffb1a6ba817da8 EFLAGS: 00010046
May 25 05:40:21 oa-nas kernel: RAX: 0000000000000000 RBX: ffff9e631fc636c0 RCX: ffff9e631fc636c8
May 25 05:40:21 oa-nas kernel: RDX: ffffb1a6ba817e10 RSI: 0000000000000002 RDI: ffff9e631fc636c0
May 25 05:40:21 oa-nas kernel: RBP: ffffb1a6ba817db8 R08: 0000000000000000 R09: 0000000000000000
May 25 05:40:21 oa-nas kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffb1a6ba817df8
May 25 05:40:21 oa-nas kernel: R13: ffff9e631fc636c0 R14: ffff9e63b4386a80 R15: ffff9e631fc63600
May 25 05:40:21 oa-nas kernel: FS:  0000000000000000(0000) GS:ffff9e81fea00000(0000) knlGS:0000000000000000
May 25 05:40:21 oa-nas kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 25 05:40:21 oa-nas kernel: CR2: 0000000000000000 CR3: 00000015b2b16003 CR4: 0000000000f72ef0
May 25 05:40:21 oa-nas kernel: PKRU: 55555554
May 25 05:40:21 oa-nas kernel: Call Trace:
May 25 05:40:21 oa-nas kernel:  <TASK>
May 25 05:40:21 oa-nas kernel:  ? show_regs+0x6d/0x80
May 25 05:40:21 oa-nas kernel:  ? __die+0x24/0x80
May 25 05:40:21 oa-nas kernel:  ? page_fault_oops+0x176/0x500
May 25 05:40:21 oa-nas kernel:  ? do_user_addr_fault+0x2f5/0x660
May 25 05:40:21 oa-nas kernel:  ? exc_page_fault+0x83/0x1b0
May 25 05:40:21 oa-nas kernel:  ? asm_exc_page_fault+0x27/0x30
May 25 05:40:21 oa-nas kernel:  ? add_wait_queue_exclusive+0x3b/0x60
May 25 05:40:21 oa-nas kernel:  ? add_wait_queue_exclusive+0x1a/0x60
May 25 05:40:21 oa-nas kernel:  taskq_thread+0x3fd/0x4c0 [spl]
May 25 05:40:21 oa-nas kernel:  ? __pfx_default_wake_function+0x10/0x10
May 25 05:40:21 oa-nas kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
May 25 05:40:21 oa-nas kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
May 25 05:40:21 oa-nas kernel:  kthread+0xef/0x120
May 25 05:40:21 oa-nas kernel:  ? __pfx_kthread+0x10/0x10
May 25 05:40:21 oa-nas kernel:  ret_from_fork+0x44/0x70
May 25 05:40:21 oa-nas kernel:  ? __pfx_kthread+0x10/0x10
May 25 05:40:21 oa-nas kernel:  ret_from_fork_asm+0x1b/0x30
May 25 05:40:21 oa-nas kernel:  </TASK>
May 25 05:40:21 oa-nas kernel: Modules linked in: nft_chain_nat xt_MASQUERADE nf_nat nft_compat cfg80211 veth nf_conntrack_netlink nfnetlink_acct udp_diag tcp_diag inet_diag wireguard curve25519_x86_64 libchacha>
May 25 05:40:21 oa-nas kernel:  drm_exec gpu_sched drm_suballoc_helper snd_sof drm_ttm_helper x86_pkg_temp_thermal intel_powerclamp snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_so>
May 25 05:40:21 oa-nas kernel:  hid_generic usbkbd usbmouse cdc_ether usbnet mii usbhid hid igb nvme xhci_pci xhci_pci_renesas i2c_algo_bit crc32_pclmul intel_lpss_pci spi_intel_pci nvme_core i40e i2c_i801 spi_i>
May 25 05:40:21 oa-nas kernel: CR2: 0000000000000000
May 25 05:40:21 oa-nas kernel: ---[ end trace 0000000000000000 ]---
May 25 05:40:21 oa-nas kernel: RIP: 0010:add_wait_queue_exclusive+0x3b/0x60
May 25 05:40:21 oa-nas kernel: Code: fb 83 0e 01 e8 76 b5 ff 00 49 8d 54 24 18 48 8d 4b 08 48 89 df 48 89 c6 48 8b 43 10 48 89 53 10 49 89 4c 24 18 49 89 44 24 20 <48> 89 10 e8 4d b6 ff 00 5b 41 5c 5d 31 c0 31 d>
May 25 05:40:21 oa-nas kernel: RSP: 0018:ffffb1a6ba817da8 EFLAGS: 00010046
May 25 05:40:21 oa-nas kernel: RAX: 0000000000000000 RBX: ffff9e631fc636c0 RCX: ffff9e631fc636c8
May 25 05:40:21 oa-nas kernel: RDX: ffffb1a6ba817e10 RSI: 0000000000000002 RDI: ffff9e631fc636c0
May 25 05:40:21 oa-nas kernel: RBP: ffffb1a6ba817db8 R08: 0000000000000000 R09: 0000000000000000
May 25 05:40:21 oa-nas kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffb1a6ba817df8
May 25 05:40:21 oa-nas kernel: R13: ffff9e631fc636c0 R14: ffff9e63b4386a80 R15: ffff9e631fc63600
May 25 05:40:21 oa-nas kernel: FS:  0000000000000000(0000) GS:ffff9e81fea00000(0000) knlGS:0000000000000000
May 25 05:40:21 oa-nas kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 25 05:40:21 oa-nas kernel: CR2: 0000000000000000 CR3: 00000015b2b16003 CR4: 0000000000f72ef0
May 25 05:40:21 oa-nas kernel: PKRU: 55555554
May 25 05:40:21 oa-nas kernel: note: z_wr_int_3[2285716] exited with irqs disabled
May 25 05:40:21 oa-nas kernel: note: z_wr_int_3[2285716] exited with preempt_count 2
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: It has been corrected by h/w and requires no further action
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: event severity: corrected
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]:  Error 0, type: corrected
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]:  fru_text: CorrectedErr
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]:   section_type: memory error
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]:   node:1 device:0
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]:   error_type: 2, single-bit ECC
 
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: It has been corrected by h/w and requires no further action
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: event severity: corrected
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: Error 0, type: corrected
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: fru_text: CorrectedErr
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: section_type: memory error
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: node:1 device:0
May 25 05:40:24 oa-nas kernel: {1}[Hardware Error]: error_type: 2, single-bit ECC
[/CODE]

Bad RAM or RAM going bad maybe?

Even though it says corrected, it may depend on WHERE that single bit error occurred.

If it happens every 3 to 4 days, check to see if there are duplicate single bit ECC's logged for each crash.

My $0.02.
 
Bad RAM or RAM going bad maybe?

Even though it says corrected, it may depend on WHERE that single bit error occurred.

If it happens every 3 to 4 days, check to see if there are duplicate single bit ECC's logged for each crash.

My $0.02.

I don't think so, as I've tested the RAM for several days when it started to hang up and didn't showed any errors.
 
Hi Everybody,
Also here kernel bug error with Kernel Linux 6.8.12-4-pve
Pve-manager/8.3.0/c1689ccb1065a83b
On a HP Proliant dl380Gen9

Code:
May 24 09:51:35 dl380Gen9 kernel: BUG: unable to handle page fault for address: ffffffffe13d6538
May 24 09:51:35 dl380Gen9 kernel: #PF: supervisor read access in kernel mode
May 24 09:51:35 dl380Gen9 kernel: #PF: error_code(0x0000) - not-present page
May 24 09:51:35 dl380Gen9 kernel: PGD 63943b067 P4D 63943b067 PUD 63943d067 PMD 0
May 24 09:51:35 dl380Gen9 kernel: Oops: 0000 [#1] PREEMPT SMP PTI
May 24 09:51:35 dl380Gen9 kernel: CPU: 17 PID: 1870592 Comm: CPU 0/KVM Tainted: P        W  O       6.8.12-4-pve #1
May 24 09:51:35 dl380Gen9 kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/17/2022
May 24 09:51:35 dl380Gen9 kernel: RIP: 0010:kvm_find_user_return_msr+0x6/0x50 [kvm]
May 24 09:51:35 dl380Gen9 kernel: Code: 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 5d c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 <8b> 0d ec 4e 09 00 48 89 e5 85 c9 74 23 31 c0 eb 07 83 c0 01 39 c1
May 24 09:51:35 dl380Gen9 kernel: RSP: 0018:ffffa0b235ccfb68 EFLAGS: 00010207
May 24 09:51:35 dl380Gen9 kernel: RAX: 0000000000000000 RBX: 0000000000000838 RCX: 0000000000000000
May 24 09:51:35 dl380Gen9 kernel: RDX: 0000000000325914 RSI: ffffa0b235ccfbd0 RDI: 0000000000000838
May 24 09:51:35 dl380Gen9 kernel: RBP: ffffa0b235ccfbb8 R08: 0000000000000000 R09: 0000000000000000
May 24 09:51:35 dl380Gen9 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b235ccfbd0
May 24 09:51:35 dl380Gen9 kernel: R13: ffff91c5efd423c0 R14: 0000000000325914 R15: 0000000000000000
May 24 09:51:35 dl380Gen9 kernel: FS:  00007af4a3e006c0(0000) GS:ffff91dfbf280000(0000) knlGS:0000000000000000
May 24 09:51:35 dl380Gen9 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 24 09:51:35 dl380Gen9 kernel: CR2: ffffffffe13d6538 CR3: 00000020cf738002 CR4: 00000000003726f0
May 24 09:51:35 dl380Gen9 kernel: Call Trace:
May 24 09:51:35 dl380Gen9 kernel:  <TASK>
May 24 09:51:35 dl380Gen9 kernel:  ? show_regs+0x6d/0x80
May 24 09:51:35 dl380Gen9 kernel:  ? __die+0x24/0x80

Please this is part of a production cluster of 3.
Any help appreciated Thanks
G.M.