[SOLVED] CephFS mit dem Kernel 6.17.13 führt zu Kernel Trace

Jul 24, 2023
8
0
6
Hello,

we updated to PBS 4 and got the new kernel as well, but this seams to have issues with CephFS. We get following error in the log:

Code:
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? ceph_mds_check_access+0x109/0x810 [ceph]
kernel:  ceph_atomic_open+0x205/0xce0 [ceph]
kernel:  path_openat+0xb93/0x1240
kernel:  do_filp_open+0xd3/0x190
kernel:  do_sys_openat2+0x8b/0xf0
kernel:  __x64_sys_openat+0x52/0xa0
kernel:  x64_sys_call+0x1bf2/0x2330
kernel:  do_syscall_64+0x80/0x8f0
kernel:  ? irqentry_exit_to_user_mode+0x252/0x290
kernel:  ? irqentry_exit+0x43/0x50
kernel:  ? exc_page_fault+0x90/0x1b0
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x766052aa69ee

We only see the error an an server running in BIOS on a nearly identical server with UEFI there is no issue. After pinning the kernel to 6.17.13 the mount works fine. Anybody else have the error or have an idea solve it?

Best regards,
Paul
 
Last edited:
Hi,
also, can you share more of the log? Is it a NULL pointer dereference or something else?
 
Hey,
thanks for the quick answer and sorry for the missing information.
We have the kernel proxmox-kernel-6.17.13-2-pve-signed installed. The full logs of the trace is following:
Code:
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
kernel: #PF: supervisor read access in kernel mode
pbs kernel: #PF: error_code(0x0000) - not-present page
pbs kernel: PGD 0 P4D 0
pbs kernel: Oops: Oops: 0000 [#1] SMP NOPTI
pbs kernel: CPU: 6 UID: 34 PID: 963 Comm: tokio-runtime-w Tainted: P           O        6.17.13-2-pve #1 PREEMPT(voluntary)
kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
kernel: RIP: 0010:strcmp+0x2c/0x50
kernel: Code: eb 24 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 0f 1f 44 00 00 48 83 c0 01 84 d2 74 19 0f b6 14 07 <3a> 14 06 74 ef 19 c0 83 c8 01 31 d2 31 f6 31 ff e9 5f d1 02 00 31
kernel: RSP: 0018:ffffd47701f57ac8 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: ffff8e2a566e6840 RCX: 0000000000000000
kernel: RDX: 0000000000000062 RSI: 0000000000000000 RDI: ffff8e2a566e8088
kernel: RBP: ffffd47701f57b40 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000022
kernel: R13: ffff8e2a626dc800 R14: ffff8e2a563de000 R15: ffff8e2a5544ab40
kernel: FS:  00007660515b06c0(0000) GS:ffff8e29cb486000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 00000001102be000 CR4: 00000000003506f0
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? ceph_mds_check_access+0x109/0x810 [ceph]
kernel:  ceph_atomic_open+0x205/0xce0 [ceph]
kernel:  path_openat+0xb93/0x1240
kernel:  do_filp_open+0xd3/0x190
kernel:  do_sys_openat2+0x8b/0xf0
kernel:  __x64_sys_openat+0x52/0xa0
kernel:  x64_sys_call+0x1bf2/0x2330
kernel:  do_syscall_64+0x80/0x8f0
kernel:  ? irqentry_exit_to_user_mode+0x252/0x290
kernel:  ? irqentry_exit+0x43/0x50
kernel:  ? exc_page_fault+0x90/0x1b0
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x766052aa69ee
kernel: Code: 08 0f 85 f5 4b ff ff 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 80 00 00 00 00 48 83 ec 08
kernel: RSP: 002b:00007660515a8848 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
kernel: RAX: ffffffffffffffda RBX: 00007660515b06c0 RCX: 0000766052aa69ee
kernel: RDX: 0000000000080042 RSI: 00007660515a8900 RDI: ffffffffffffff9c
kernel: RBP: 00007660515a8ae0 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000080042
kernel: R13: 0000766052b0f7b0 R14: 00007660515a8900 R15: 00000000000001b6
kernel:  </TASK>
kernel: Modules linked in: ceph libceph netfs sunrpc bonding tls binfmt_misc sch_fq_codel intel_rapl_msr intel_rapl_common polyval_clmulni vga16fb joydev ghash_clmulni_intel input_leds bochs vgastate aesni_intel pcspkr vmgenid mac_hid zfs(PO) spl(O) efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci dmi_sysfs qemu_fw_cfg virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid btrfs blake2b_generic xor raid6_pq psmouse uhci_hcd serio_raw ehci_pci i2c_piix4 ehci_hcd pata_acpi i2c_smbus floppy
kernel: CR2: 0000000000000000
kernel: ---[ end trace 0000000000000000 ]---
kernel: RIP: 0010:strcmp+0x2c/0x50
kernel: Code: eb 24 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 0f 1f 44 00 00 48 83 c0 01 84 d2 74 19 0f b6 14 07 <3a> 14 06 74 ef 19 c0 83 c8 01 31 d2 31 f6 31 ff e9 5f d1 02 00 31
kernel: RSP: 0018:ffffd47701f57ac8 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: ffff8e2a566e6840 RCX: 0000000000000000
kernel: RDX: 0000000000000062 RSI: 0000000000000000 RDI: ffff8e2a566e8088
kernel: RBP: ffffd47701f57b40 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000022
kernel: R13: ffff8e2a626dc800 R14: ffff8e2a563de000 R15: ffff8e2a5544ab40
kernel: FS:  00007660515b06c0(0000) GS:ffff8e29cb486000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 00000001102be000 CR4: 00000000003506f0
kernel: note: tokio-runtime-w[963] exited with irqs disabled

So yes, it seams that is the same issue. Is there an timeline when the patch is available.

Best regards,
Paul
 
Last edited:
Yes, it very much looks like the same issue. The fix was applied in git, so will be part of the next kernel build. Until such a build is available, it's recommended to use the 6.17.4-2-pve kernel.