Hello,
I have a PVE node installed in: HX99G
Since the latest update:
root@pve:~# ls -lh /boot | grep vmlinuz
-rw-r--r-- 1 root root 14M Jan 24 13:32 vmlinuz-6.8.12-8-pve
-rw-r--r-- 1 root root 14M Mar 16 20:18 vmlinuz-6.8.12-9-pv
I'm getting random kernel crashes on proxmox with kernel 6.8.12-9-pve. the error is a page fault in the xhci_hcd module, looks like it happens during usb control transfer (xhci_queue_ctrl_tx).
the process involved is usb-storage.
I have two usb devices connected:
– a sata-to-usb adapter with an external hdd
– a zigbee dongle
not sure exactly when it happens, can't reproduce it reliably. just random oops in logs, it is happening after certain time, and to solve the situation I have to do a hard reset (basically plug/unplug), since not able to reboot through the GUI, neither with the hdmi connected. The LXC I had continue working with several issues.
See logs:
Apr 06 04:45:54 pve kernel: BUG: unable to handle page fault for address: ffffab3e0041600c
Apr 06 04:45:54 pve kernel: #PF: supervisor read access in kernel mode
Apr 06 04:45:54 pve kernel: #PF: error_code(0x0000) - not-present page
Apr 06 04:45:54 pve kernel: PGD 100000067 P4D 100000067 PUD 10028b067 PMD 100fc6067 PTE 0
Apr 06 04:45:54 pve kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Apr 06 04:45:54 pve kernel: CPU: 9 PID: 357 Comm: usb-storage Tainted: P O 6.8.12-9-pve #1
Apr 06 04:45:54 pve kernel: Hardware name: Micro Computer (HK) Tech Limited HX99G/F7BAA, BIOS 0.18 03/06/2024
Apr 06 04:45:54 pve kernel: RIP: 0010:xhci_queue_ctrl_tx+0xaa/0x400 [xhci_hcd]
Apr 06 04:45:54 pve kernel: Code: 02 00 00 48 83 bb 88 00 00 00 00 0f 84 cd 02 00 00 41 f6 84 24 b6 09 00 00 0>
Apr 06 04:45:54 pve kernel: RSP: 0018:ffffab3e00f27ae0 EFLAGS: 00010002
Apr 06 04:45:54 pve kernel: RAX: ffffab3e00415ff0 RBX: ffff8febd8c49bc0 RCX: 0000000000000000
Apr 06 04:45:54 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 06 04:45:54 pve kernel: RBP: ffffab3e00f27b40 R08: 0000000000000000 R09: 0000000000000000
Apr 06 04:45:54 pve kernel: R10: ffff8febc2381b00 R11: 0000000000000000 R12: ffff8febd948f260
Apr 06 04:45:54 pve kernel: R13: ffff8febd841eb00 R14: 0000000000000820 R15: ffff8febd948f2a4
Apr 06 04:45:54 pve kernel: FS: 0000000000000000(0000) GS:ffff8ffabe680000(0000) knlGS:0000000000000000
Apr 06 04:45:54 pve kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 06 04:45:54 pve kernel: CR2: ffffab3e0041600c CR3: 000000037c236000 CR4: 0000000000f50ef0
Apr 06 04:45:54 pve kernel: PKRU: 55555554
Apr 06 04:45:54 pve kernel: Call Trace:
Apr 06 04:45:54 pve kernel: <TASK>
Apr 06 04:45:54 pve kernel: ? show_regs+0x6d/0x80
Apr 06 04:45:54 pve kernel: ? __die+0x24/0x80
Apr 06 04:45:54 pve kernel: ? page_fault_oops+0x176/0x500
Apr 06 04:45:54 pve kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Apr 06 04:45:54 pve kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Apr 06 04:45:54 pve kernel: ? xhci_queue_ctrl_tx+0xaa/0x400 [xhci_hcd]
Apr 06 04:45:54 pve kernel: ? kernelmode_fixup_or_oops.constprop.0+0x69/0x90
Apr 06 04:45:54 pve kernel: ? __bad_area_nosemaphore+0x19d/0x270
Apr 06 04:45:54 pve kernel: ? bad_area_nosemaphore+0x16/0x30
Apr 06 04:45:54 pve kernel: ? do_kern_addr_fault+0x7b/0xa0
Apr 06 04:45:54 pve kernel: ? exc_page_fault+0x10d/0x1b0
Apr 06 04:45:54 pve kernel: ? asm_exc_page_fault+0x27/0x30
Apr 06 04:45:54 pve kernel: ? xhci_queue_ctrl_tx+0xaa/0x400 [xhci_hcd]
Apr 06 04:45:54 pve kernel: xhci_urb_enqueue+0x1bb/0x3a0 [xhci_hcd]
Apr 06 04:45:54 pve kernel: usb_hcd_submit_urb+0xc3/0xc20
Apr 06 04:45:54 pve kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Apr 06 04:45:54 pve kernel: ? schedule+0x33/0x110
Apr 06 04:45:54 pve kernel: usb_submit_urb+0x254/0x660
Apr 06 04:45:54 pve kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Apr 06 04:45:54 pve kernel: usb_stor_msg_common+0xc3/0x170 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_clear_halt+0xbd/0x100 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_bulk_transfer_buf+0xee/0x120 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_Bulk_transport+0x1e0/0x460 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_invoke_transport+0x206/0x500 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_transparent_scsi_command+0xe/0x20 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_control_thread+0x1f1/0x2b0 [usb_storage]
Apr 06 04:45:54 pve kernel: ? __pfx_usb_stor_control_thread+0x10/0x10 [usb_storage]
Apr 06 04:45:54 pve kernel: kthread+0xf2/0x120
Apr 06 04:45:54 pve kernel: ? __pfx_kthread+0x10/0x10
Apr 06 04:45:54 pve kernel: ret_from_fork+0x47/0x70
Apr 06 04:45:54 pve kernel: ? __pfx_kthread+0x10/0x10
Apr 06 04:45:54 pve kernel: ret_from_fork_asm+0x1b/0x30
Apr 06 04:45:54 pve kernel: </TASK>
I have opened this incident: https://bugzilla.proxmox.com/show_bug.cgi?id=6288
Can someone help?
I have a PVE node installed in: HX99G
Since the latest update:
root@pve:~# ls -lh /boot | grep vmlinuz
-rw-r--r-- 1 root root 14M Jan 24 13:32 vmlinuz-6.8.12-8-pve
-rw-r--r-- 1 root root 14M Mar 16 20:18 vmlinuz-6.8.12-9-pv
I'm getting random kernel crashes on proxmox with kernel 6.8.12-9-pve. the error is a page fault in the xhci_hcd module, looks like it happens during usb control transfer (xhci_queue_ctrl_tx).
the process involved is usb-storage.
I have two usb devices connected:
– a sata-to-usb adapter with an external hdd
– a zigbee dongle
not sure exactly when it happens, can't reproduce it reliably. just random oops in logs, it is happening after certain time, and to solve the situation I have to do a hard reset (basically plug/unplug), since not able to reboot through the GUI, neither with the hdmi connected. The LXC I had continue working with several issues.
See logs:
Apr 06 04:45:54 pve kernel: BUG: unable to handle page fault for address: ffffab3e0041600c
Apr 06 04:45:54 pve kernel: #PF: supervisor read access in kernel mode
Apr 06 04:45:54 pve kernel: #PF: error_code(0x0000) - not-present page
Apr 06 04:45:54 pve kernel: PGD 100000067 P4D 100000067 PUD 10028b067 PMD 100fc6067 PTE 0
Apr 06 04:45:54 pve kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Apr 06 04:45:54 pve kernel: CPU: 9 PID: 357 Comm: usb-storage Tainted: P O 6.8.12-9-pve #1
Apr 06 04:45:54 pve kernel: Hardware name: Micro Computer (HK) Tech Limited HX99G/F7BAA, BIOS 0.18 03/06/2024
Apr 06 04:45:54 pve kernel: RIP: 0010:xhci_queue_ctrl_tx+0xaa/0x400 [xhci_hcd]
Apr 06 04:45:54 pve kernel: Code: 02 00 00 48 83 bb 88 00 00 00 00 0f 84 cd 02 00 00 41 f6 84 24 b6 09 00 00 0>
Apr 06 04:45:54 pve kernel: RSP: 0018:ffffab3e00f27ae0 EFLAGS: 00010002
Apr 06 04:45:54 pve kernel: RAX: ffffab3e00415ff0 RBX: ffff8febd8c49bc0 RCX: 0000000000000000
Apr 06 04:45:54 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 06 04:45:54 pve kernel: RBP: ffffab3e00f27b40 R08: 0000000000000000 R09: 0000000000000000
Apr 06 04:45:54 pve kernel: R10: ffff8febc2381b00 R11: 0000000000000000 R12: ffff8febd948f260
Apr 06 04:45:54 pve kernel: R13: ffff8febd841eb00 R14: 0000000000000820 R15: ffff8febd948f2a4
Apr 06 04:45:54 pve kernel: FS: 0000000000000000(0000) GS:ffff8ffabe680000(0000) knlGS:0000000000000000
Apr 06 04:45:54 pve kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 06 04:45:54 pve kernel: CR2: ffffab3e0041600c CR3: 000000037c236000 CR4: 0000000000f50ef0
Apr 06 04:45:54 pve kernel: PKRU: 55555554
Apr 06 04:45:54 pve kernel: Call Trace:
Apr 06 04:45:54 pve kernel: <TASK>
Apr 06 04:45:54 pve kernel: ? show_regs+0x6d/0x80
Apr 06 04:45:54 pve kernel: ? __die+0x24/0x80
Apr 06 04:45:54 pve kernel: ? page_fault_oops+0x176/0x500
Apr 06 04:45:54 pve kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Apr 06 04:45:54 pve kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Apr 06 04:45:54 pve kernel: ? xhci_queue_ctrl_tx+0xaa/0x400 [xhci_hcd]
Apr 06 04:45:54 pve kernel: ? kernelmode_fixup_or_oops.constprop.0+0x69/0x90
Apr 06 04:45:54 pve kernel: ? __bad_area_nosemaphore+0x19d/0x270
Apr 06 04:45:54 pve kernel: ? bad_area_nosemaphore+0x16/0x30
Apr 06 04:45:54 pve kernel: ? do_kern_addr_fault+0x7b/0xa0
Apr 06 04:45:54 pve kernel: ? exc_page_fault+0x10d/0x1b0
Apr 06 04:45:54 pve kernel: ? asm_exc_page_fault+0x27/0x30
Apr 06 04:45:54 pve kernel: ? xhci_queue_ctrl_tx+0xaa/0x400 [xhci_hcd]
Apr 06 04:45:54 pve kernel: xhci_urb_enqueue+0x1bb/0x3a0 [xhci_hcd]
Apr 06 04:45:54 pve kernel: usb_hcd_submit_urb+0xc3/0xc20
Apr 06 04:45:54 pve kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Apr 06 04:45:54 pve kernel: ? schedule+0x33/0x110
Apr 06 04:45:54 pve kernel: usb_submit_urb+0x254/0x660
Apr 06 04:45:54 pve kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Apr 06 04:45:54 pve kernel: usb_stor_msg_common+0xc3/0x170 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_clear_halt+0xbd/0x100 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_bulk_transfer_buf+0xee/0x120 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_Bulk_transport+0x1e0/0x460 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_invoke_transport+0x206/0x500 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_transparent_scsi_command+0xe/0x20 [usb_storage]
Apr 06 04:45:54 pve kernel: usb_stor_control_thread+0x1f1/0x2b0 [usb_storage]
Apr 06 04:45:54 pve kernel: ? __pfx_usb_stor_control_thread+0x10/0x10 [usb_storage]
Apr 06 04:45:54 pve kernel: kthread+0xf2/0x120
Apr 06 04:45:54 pve kernel: ? __pfx_kthread+0x10/0x10
Apr 06 04:45:54 pve kernel: ret_from_fork+0x47/0x70
Apr 06 04:45:54 pve kernel: ? __pfx_kthread+0x10/0x10
Apr 06 04:45:54 pve kernel: ret_from_fork_asm+0x1b/0x30
Apr 06 04:45:54 pve kernel: </TASK>
I have opened this incident: https://bugzilla.proxmox.com/show_bug.cgi?id=6288
Can someone help?