8.2 Kernel 6.8.4-2-pve boot hanging with nouveau error - but 6.5.13-5-pve is OK

Apr 25, 2024
3
1
3
Hi not sure if I should post this as a bug or not. I upgraded 3 boxes to 8.2 - 2 are ok but 1 just hung - the error seems to be related to the graphics card driver, it boots fine on kernel 6.5.13-5 and all prior kernel versions (for the time being I've just pulled the graphics card from that pc).

Code:
Apr 25 15:19:46 pvev2 kernel: ------------[ cut here ]------------
Apr 25 15:19:46 pvev2 kernel: WARNING: CPU: 0 PID: 705 at drivers/gpu/drm/nouveau/nouveau_connector.c:1324 nouveau_connector_create+0x701/0x7a0 [nouveau]
Apr 25 15:19:46 pvev2 kernel: Modules linked in: intel_rapl_msr intel_rapl_common sb_edac nouveau(+) x86_pkg_temp_thermal intel_powerclamp drm_gpuvm drm_exec gpu_sched coretemp binfmt_misc drm_ttm_helper ttm snd_hda_codec_realtek kvm_intel snd_hda_codec_generic drm_display_helper kvm snd_hda_intel irqbypass crct10dif_pclmul polyval_clmulni polyval_generic snd_intel_dspcfg snd_intel_sdw_acpi ghash_clmulni_intel cec snd_hda_codec rc_core snd_hda_core sha256_ssse3 video input_leds sha1_ssse3 serio_raw snd_hwdep spi_nor snd_pcm mtd snd_timer aesni_intel crypto_simd cryptd snd rapl pcspkr intel_cstate soundcore mxm_wmi mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xhci_pci spi_intel_platform xhci_pci_renesas spi_intel gpio_ich igb r8169 ehci_pci ahci i2c_i801 crc32_pclmul xhci_hcd psmouse i2c_algo_bit ehci_hcd realtek libahci i2c_smbus dca lpc_ich wmi
Apr 25 15:19:46 pvev2 kernel: CPU: 0 PID: 705 Comm: kworker/0:8 Tainted: P           O       6.8.4-2-pve #1
Apr 25 15:19:46 pvev2 kernel: Hardware name: Default string Default string/E5-MR9A PRO, BIOS 5.11 12/06/2022
Apr 25 15:19:46 pvev2 kernel: Workqueue: events work_for_cpu_fn
Apr 25 15:19:46 pvev2 kernel: RIP: 0010:nouveau_connector_create+0x701/0x7a0 [nouveau]
Apr 25 15:19:46 pvev2 kernel: Code: c7 c6 80 86 51 c1 48 8b 80 a8 06 00 00 48 8b 78 08 e8 a3 6f d3 e4 48 89 df e8 0b e2 66 e4 49 63 c4 48 89 45 b0 e9 8b f9 ff ff <0f> 0b 48 c7 45 b0 00 00 00 00 e9 7c f9 ff ff 41 f6 c0 08 0f 85 20
Apr 25 15:19:46 pvev2 kernel: RSP: 0018:ffffbe4e807efc20 EFLAGS: 00010297
Apr 25 15:19:46 pvev2 kernel: RAX: 0000000000000001 RBX: ffff958f5a01a000 RCX: 0000000000000000
Apr 25 15:19:46 pvev2 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 25 15:19:46 pvev2 kernel: RBP: ffffbe4e807efc88 R08: 0000000000000000 R09: 0000000000000000
Apr 25 15:19:46 pvev2 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff958f5b4c3800
Apr 25 15:19:46 pvev2 kernel: R13: ffff958f4eeea000 R14: ffff958f402f6780 R15: 0000000000000000
Apr 25 15:19:46 pvev2 kernel: FS:  0000000000000000(0000) GS:ffff959e7f400000(0000) knlGS:0000000000000000
Apr 25 15:19:46 pvev2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 25 15:19:46 pvev2 kernel: CR2: 00006281a1d67c40 CR3: 0000000ed8636004 CR4: 00000000001706f0
Apr 25 15:19:46 pvev2 kernel: Call Trace:
Apr 25 15:19:46 pvev2 kernel:  <TASK>
Apr 25 15:19:46 pvev2 kernel:  ? show_regs+0x6d/0x80
Apr 25 15:19:46 pvev2 kernel:  ? __warn+0x89/0x160
Apr 25 15:19:46 pvev2 kernel:  ? nouveau_connector_create+0x701/0x7a0 [nouveau]
Apr 25 15:19:46 pvev2 kernel:  ? report_bug+0x17e/0x1b0
Apr 25 15:19:46 pvev2 kernel:  ? handle_bug+0x46/0x90
Apr 25 15:19:46 pvev2 kernel:  ? exc_invalid_op+0x18/0x80
Apr 25 15:19:46 pvev2 kernel:  ? asm_exc_invalid_op+0x1b/0x20
Apr 25 15:19:46 pvev2 kernel:  ? nouveau_connector_create+0x701/0x7a0 [nouveau]
Apr 25 15:19:46 pvev2 kernel:  nv50_display_create+0x31b/0xce0 [nouveau]
Apr 25 15:19:46 pvev2 kernel:  nouveau_display_create+0x289/0x570 [nouveau]
Apr 25 15:19:46 pvev2 kernel:  nouveau_drm_device_init+0x465/0x9f0 [nouveau]
Apr 25 15:19:46 pvev2 kernel:  nouveau_drm_probe+0x137/0x280 [nouveau]
Apr 25 15:19:46 pvev2 kernel:  local_pci_probe+0x47/0xb0
Apr 25 15:19:46 pvev2 kernel:  work_for_cpu_fn+0x1a/0x30
Apr 25 15:19:46 pvev2 kernel:  process_one_work+0x16d/0x350
Apr 25 15:19:46 pvev2 kernel:  worker_thread+0x306/0x440
Apr 25 15:19:46 pvev2 kernel:  ? __pfx_worker_thread+0x10/0x10
Apr 25 15:19:46 pvev2 kernel:  kthread+0xf2/0x120
Apr 25 15:19:46 pvev2 kernel:  ? __pfx_kthread+0x10/0x10
Apr 25 15:19:46 pvev2 kernel:  ret_from_fork+0x47/0x70
Apr 25 15:19:46 pvev2 kernel:  ? __pfx_kthread+0x10/0x10
Apr 25 15:19:46 pvev2 kernel:  ret_from_fork_asm+0x1b/0x30
Apr 25 15:19:46 pvev2 kernel:  </TASK>
Apr 25 15:19:46 pvev2 kernel: ---[ end trace 0000000000000000 ]---
Apr 25 15:19:46 pvev2 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Apr 25 15:19:46 pvev2 kernel: #PF: supervisor read access in kernel mode
Apr 25 15:19:46 pvev2 kernel: #PF: error_code(0x0000) - not-present page
Apr 25 15:19:57 pvev2 kernel: PGD 0 P4D 0
Apr 25 15:19:57 pvev2 kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Apr 25 15:19:57 pvev2 kernel: CPU: 0 PID: 705 Comm: kworker/0:8 Tainted: P        W  O       6.8.4-2-pve #1
Apr 25 15:19:57 pvev2 kernel: Hardware name: Default string Default string/E5-MR9A PRO, BIOS 5.11 12/06/2022
Apr 25 15:19:57 pvev2 kernel: Workqueue: events work_for_cpu_fn
Apr 25 15:19:57 pvev2 kernel: RIP: 0010:nv50_display_create+0x686/0xce0 [nouveau]
Apr 25 15:19:57 pvev2 kernel: Code: e9 cb fd ff ff 8b 83 e8 00 00 00 41 c7 46 08 00 00 00 00 41 89 46 18 e9 60 fd ff ff 4c 8b b3 00 01 00 00 48 8b 93 98 00 00 00 <49> 8b 06 0f b6 72 0c 48 89 55 d0 48 8b 40 38 48 8b 80 f0 03 00 00
Apr 25 15:19:57 pvev2 kernel: RSP: 0018:ffffbe4e807efc98 EFLAGS: 00010246
Apr 25 15:19:57 pvev2 kernel: RAX: ffff958f401de340 RBX: ffff958f426df800 RCX: 0000000000000000
Apr 25 15:19:57 pvev2 kernel: RDX: ffff958f401de340 RSI: 0000000000000001 RDI: 0000000000000000
Apr 25 15:19:57 pvev2 kernel: RBP: ffffbe4e807efcf8 R08: 0000000000000000 R09: 0000000000000000
Apr 25 15:19:57 pvev2 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Apr 25 15:19:57 pvev2 kernel: R13: 0000000000000003 R14: 0000000000000000 R15: ffff958f40227700
Apr 25 15:19:57 pvev2 kernel: FS:  0000000000000000(0000) GS:ffff959e7f400000(0000) knlGS:0000000000000000
Apr 25 15:19:57 pvev2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 25 15:19:57 pvev2 kernel: CR2: 0000000000000000 CR3: 0000000ed8636004 CR4: 00000000001706f0
Apr 25 15:19:57 pvev2 kernel: Call Trace:
Apr 25 15:19:57 pvev2 kernel:  <TASK>
Apr 25 15:19:57 pvev2 kernel:  ? show_regs+0x6d/0x80
Apr 25 15:19:57 pvev2 kernel:  ? __die+0x24/0x80
Apr 25 15:19:57 pvev2 kernel:  ? page_fault_oops+0x176/0x500
Apr 25 15:19:57 pvev2 kernel:  ? nv50_display_create+0x686/0xce0 [nouveau]
Apr 25 15:19:57 pvev2 kernel:  ? kernelmode_fixup_or_oops+0xb2/0x140
Apr 25 15:19:57 pvev2 kernel:  ? __bad_area_nosemaphore+0x1a5/0x270
Apr 25 15:19:57 pvev2 kernel:  ? bad_area_nosemaphore+0x16/0x30
Apr 25 15:19:57 pvev2 kernel:  ? do_user_addr_fault+0x2a6/0x6b0
Apr 25 15:19:57 pvev2 kernel:  ? asm_exc_invalid_op+0x1b/0x20
Apr 25 15:19:57 pvev2 kernel:  ? exc_page_fault+0x83/0x1b0
Apr 25 15:19:57 pvev2 kernel:  ? asm_exc_page_fault+0x27/0x30
Apr 25 15:19:57 pvev2 kernel:  ? nv50_display_create+0x686/0xce0 [nouveau]
Apr 25 15:19:57 pvev2 kernel:  ? nv50_display_create+0x384/0xce0 [nouveau]
Apr 25 15:19:57 pvev2 kernel:  nouveau_display_create+0x289/0x570 [nouveau]
Apr 25 15:19:57 pvev2 kernel:  nouveau_drm_device_init+0x465/0x9f0 [nouveau]
Apr 25 15:19:57 pvev2 kernel:  nouveau_drm_probe+0x137/0x280 [nouveau]
Apr 25 15:19:57 pvev2 kernel:  local_pci_probe+0x47/0xb0
Apr 25 15:19:57 pvev2 kernel:  work_for_cpu_fn+0x1a/0x30
Apr 25 15:19:57 pvev2 kernel:  process_one_work+0x16d/0x350
Apr 25 15:19:57 pvev2 kernel:  worker_thread+0x306/0x440
Apr 25 15:19:57 pvev2 kernel:  ? __pfx_worker_thread+0x10/0x10
Apr 25 15:19:57 pvev2 kernel:  kthread+0xf2/0x120
Apr 25 15:19:57 pvev2 kernel:  ? __pfx_kthread+0x10/0x10
Apr 25 15:19:57 pvev2 kernel:  ret_from_fork+0x47/0x70
Apr 25 15:19:57 pvev2 kernel:  ? __pfx_kthread+0x10/0x10
Apr 25 15:19:57 pvev2 kernel:  ret_from_fork_asm+0x1b/0x30
Apr 25 15:19:57 pvev2 kernel:  </TASK>
Apr 25 15:19:57 pvev2 kernel: Modules linked in: intel_rapl_msr intel_rapl_common sb_edac nouveau(+) x86_pkg_temp_thermal intel_powerclamp drm_gpuvm drm_exec gpu_sched coretemp binfmt_misc drm_ttm_helper ttm snd_hda_codec_realtek kvm_intel snd_hda_codec_generic drm_display_helper kvm snd_hda_intel irqbypass crct10dif_pclmul polyval_clmulni polyval_generic snd_intel_dspcfg snd_intel_sdw_acpi ghash_clmulni_intel cec snd_hda_codec rc_core snd_hda_core sha256_ssse3 video input_leds sha1_ssse3 serio_raw snd_hwdep spi_nor snd_pcm mtd snd_timer aesni_intel crypto_simd cryptd snd rapl pcspkr intel_cstate soundcore mxm_wmi mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xhci_pci spi_intel_platform xhci_pci_renesas spi_intel gpio_ich igb r8169 ehci_pci ahci i2c_i801 crc32_pclmul xhci_hcd psmouse i2c_algo_bit ehci_hcd realtek libahci i2c_smbus dca lpc_ich wmi
Apr 25 15:19:57 pvev2 kernel: CR2: 0000000000000000
Apr 25 15:19:57 pvev2 kernel: ---[ end trace 0000000000000000 ]---
Apr 25 15:19:57 pvev2 kernel: RIP: 0010:nv50_display_create+0x686/0xce0 [nouveau]
Apr 25 15:19:57 pvev2 kernel: Code: e9 cb fd ff ff 8b 83 e8 00 00 00 41 c7 46 08 00 00 00 00 41 89 46 18 e9 60 fd ff ff 4c 8b b3 00 01 00 00 48 8b 93 98 00 00 00 <49> 8b 06 0f b6 72 0c 48 89 55 d0 48 8b 40 38 48 8b 80 f0 03 00 00
Apr 25 15:19:57 pvev2 kernel: RSP: 0018:ffffbe4e807efc98 EFLAGS: 00010246
Apr 25 15:19:57 pvev2 kernel: RAX: ffff958f401de340 RBX: ffff958f426df800 RCX: 0000000000000000
Apr 25 15:19:57 pvev2 kernel: RDX: ffff958f401de340 RSI: 0000000000000001 RDI: 0000000000000000
Apr 25 15:19:57 pvev2 kernel: RBP: ffffbe4e807efcf8 R08: 0000000000000000 R09: 0000000000000000
Apr 25 15:19:57 pvev2 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Apr 25 15:19:57 pvev2 kernel: R13: 0000000000000003 R14: 0000000000000000 R15: ffff958f40227700
Apr 25 15:19:57 pvev2 kernel: FS:  0000000000000000(0000) GS:ffff959e7f400000(0000) knlGS:0000000000000000
Apr 25 15:19:57 pvev2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 25 15:19:57 pvev2 kernel: CR2: 0000000000000000 CR3: 000000011236c006 CR4: 00000000001706f0
Apr 25 15:19:57 pvev2 kernel: note: kworker/0:8[705] exited with irqs disabled
Apr 25 15:19:57 pvev2 (plymouth)[1049]: rescue.service: Executable /bin/plymouth missing, skipping: No such file or directory
Apr 25 15:20:45 pvev2 systemd-udevd[718]: 0000:03:00.0: Worker [790] processing SEQNUM=5524 is taking a long time
Apr 25 15:21:44 pvev2 udevadm[708]: Timed out for waiting the udev queue being empty.
Apr 25 15:21:44 pvev2 systemd[1]: ifupdown2-pre.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 15:21:44 pvev2 systemd[1]: ifupdown2-pre.service: Failed with result 'exit-code'.
Apr 25 15:21:44 pvev2 systemd[1]: Failed to start ifupdown2-pre.service - Helper to synchronize boot up for ifupdown.
Apr 25 15:21:44 pvev2 systemd[1]: Dependency failed for networking.service - Network initialization.
Apr 25 15:21:44 pvev2 systemd[1]: networking.service: Job networking.service/start failed with result 'dependency'.
Apr 25 15:21:44 pvev2 systemd[1]: Reached target network.target - Network.
Apr 25 15:21:44 pvev2 systemd[1]: Reached target network-online.target - Network is Online.
Apr 25 15:21:44 pvev2 systemd[1]: Starting iscsid.service - iSCSI initiator daemon (iscsid)...
Apr 25 15:21:44 pvev2 iscsid[1061]: iSCSI logger with pid=1062 started!
Apr 25 15:21:44 pvev2 systemd[1]: Started iscsid.service - iSCSI initiator daemon (iscsid).
Apr 25 15:21:44 pvev2 systemd[1]: open-iscsi.service - Login to default iSCSI targets was skipped because no trigger condition checks were met.
Apr 25 15:21:44 pvev2 systemd[1]: Reached target remote-fs-pre.target - Preparation for Remote File Systems.
Apr 25 15:21:44 pvev2 systemd[1]: Finished blk-availability.service - Availability of block devices.
Apr 25 15:21:44 pvev2 systemd[1]: Startup finished in 27.777s (firmware) + 7.697s (loader) + 20.583s (kernel) + 2min 5.551s (userspace) = 3min 1.610s.
Apr 25 15:21:44 pvev2 kernel: Loading iSCSI transport class v2.0-870.
Apr 25 15:21:45 pvev2 iscsid[1062]: iSCSI daemon with pid=1063 started!
Apr 25 15:22:45 pvev2 systemd-udevd[718]: 0000:03:00.0: Worker [790] processing SEQNUM=5524 killed
 
which model GPU was this?
 
quick look didn't find anything unfortunately. this is a rather old GPU, so unlikely to get much testing exposure on newer kernels. you might have better luck with the proprietary nvidia driver if that supports it - good that you don't need it for day to day operations ;)
 
Cheers Fabian

I threw the GPU back in and blacklisted the nouveau driver and it's working (as much as I need it to) I can see the POST and proxmox login prompt and if I need to get into the BIOS I can - happy days
 
  • Like
Reactions: fabian

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!