[SOLVED] kernel 7.0.14-2, PCIe pass-through for amdgpu does not work properly.

uzumo

Well-Known Member
Apr 5, 2025
561
164
48
After applying kernel 7.0.14-2, the following error occurs when I start a virtual machine with an RX9070 XT connected via PCIe passthrough.

When this happens, the virtual machine task itself does not complete and hangs.

I have confirmed that the system operates normally when I pin the kernel to version 7.0.12-1.

I would like to resolve this issue. Could you please provide any advice on how to investigate this or suggest a solution?

When a virtual machine with GPU passthrough is running, I display the virtual machine's screen on monitor, and when it is stopped, I display the console output on my monitor. For this reason, I cannot add these to the blacklist.

Code:
Jul 03 15:00:36 pve1 qm[7241]: <root@pam> starting task UPID:pve1:00001C4A:00011A69:6A475004:qmstart:926:root@pam:
Jul 03 15:00:36 pve1 qm[7242]: start VM 926: UPID:pve1:00001C4A:00011A69:6A475004:qmstart:926:root@pam:
Jul 03 15:00:41 pve1 kernel: Console: switching to colour dummy device 80x25
Jul 03 15:00:41 pve1 kernel: amdgpu 0000:04:00.0: finishing device.
Jul 03 15:00:42 pve1 kernel: BUG: kernel NULL pointer dereference, address: 00000000000000b4
Jul 03 15:00:42 pve1 kernel: #PF: supervisor read access in kernel mode
Jul 03 15:00:42 pve1 kernel: #PF: error_code(0x0000) - not-present page
Jul 03 15:00:42 pve1 kernel: PGD 0 P4D 0
Jul 03 15:00:42 pve1 kernel: Oops: Oops: 0000 [#1] SMP NOPTI
Jul 03 15:00:42 pve1 kernel: CPU: 7 UID: 0 PID: 313 Comm: kworker/u80:7 Tainted: P           O        7.0.14-2-pve #1 PREEMPT(lazy)
Jul 03 15:00:42 pve1 kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
Jul 03 15:00:42 pve1 kernel: Hardware name: ASRock Z890 Pro RS WiFi White/Z890 Pro RS WiFi White, BIOS 3.24 01/28/2026
Jul 03 15:00:42 pve1 kernel: Workqueue: events_unbound dm_ism_sso_delayed_work_func [amdgpu]
Jul 03 15:00:42 pve1 kernel: RIP: 0010:dc_allow_idle_optimizations_internal+0x1f/0x380 [amdgpu]
Jul 03 15:00:42 pve1 kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 55 41 54 53 89 f3 48 83 ec 30 48 89 55 b0 <44> 0f b6 b7 b4 00 00 00 4c 8b a7 88 06 00 00 65 48 8b 05 12 7c 77
Jul 03 15:00:42 pve1 kernel: RSP: 0018:ffffd15f40e7fd28 EFLAGS: 00010286
Jul 03 15:00:42 pve1 kernel: RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000000001
Jul 03 15:00:42 pve1 kernel: RDX: ffffffffc1eae660 RSI: 0000000000000000 RDI: 0000000000000000
Jul 03 15:00:42 pve1 kernel: RBP: ffffd15f40e7fd80 R08: ffffffffc204a376 R09: 0000000000000000
Jul 03 15:00:42 pve1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jul 03 15:00:42 pve1 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
Jul 03 15:00:42 pve1 kernel: FS:  0000000000000000(0000) GS:ffff8aa58d88e000(0000) knlGS:0000000000000000
Jul 03 15:00:42 pve1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 03 15:00:42 pve1 kernel: CR2: 00000000000000b4 CR3: 000000000363e003 CR4: 0000000000f72ef0
Jul 03 15:00:42 pve1 kernel: PKRU: 55555554
Jul 03 15:00:42 pve1 kernel: Call Trace:
Jul 03 15:00:42 pve1 kernel:  <TASK>
Jul 03 15:00:42 pve1 kernel:  dm_ism_commit_idle_optimization_state+0xee/0x250 [amdgpu]
Jul 03 15:00:42 pve1 kernel:  amdgpu_dm_ism_commit_event+0x104/0x770 [amdgpu]
Jul 03 15:00:42 pve1 kernel:  dm_ism_sso_delayed_work_func+0x40/0x60 [amdgpu]
Jul 03 15:00:42 pve1 kernel:  process_one_work+0x1a9/0x3c0
Jul 03 15:00:42 pve1 kernel:  worker_thread+0x1b8/0x360
Jul 03 15:00:42 pve1 kernel:  ? __pfx_worker_thread+0x10/0x10
Jul 03 15:00:42 pve1 kernel:  kthread+0xf7/0x130
Jul 03 15:00:42 pve1 kernel:  ? __pfx_kthread+0x10/0x10
Jul 03 15:00:42 pve1 kernel:  ret_from_fork+0x2da/0x3a0
Jul 03 15:00:42 pve1 kernel:  ? __pfx_kthread+0x10/0x10
Jul 03 15:00:42 pve1 kernel:  ret_from_fork_asm+0x1a/0x30
Jul 03 15:00:42 pve1 kernel:  </TASK>
Jul 03 15:00:42 pve1 kernel: Modules linked in: nls_utf8 cifs nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs snd_seq_dummy snd_seq_midi snd_hrtimer snd_seq_midi_event snd_seq ebtable_filter ebtables ip_set ip6table_raw iptabl>
Jul 03 15:00:42 pve1 kernel:  snd_hda_codec_atihdmi snd_soc_sdw_utils snd_hda_codec_hdmi intel_powerclamp snd_soc_acpi iwlmvm soundwire_bus snd_hda_intel processor_thermal_device_pci kvm_intel processor_thermal_device snd_hda_codec snd_>
Jul 03 15:00:42 pve1 kernel:  int340x_thermal_zone input_leds joydev int3400_thermal acpi_thermal_rel acpi_pad mac_hid sch_fq_codel vhost_net vhost vhost_iotlb tap coretemp nct6683 vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio >
Jul 03 15:00:42 pve1 kernel: CR2: 00000000000000b4
Jul 03 15:00:42 pve1 kernel: ---[ end trace 0000000000000000 ]---
Jul 03 15:00:42 pve1 kernel: RIP: 0010:dc_allow_idle_optimizations_internal+0x1f/0x380 [amdgpu]
Jul 03 15:00:42 pve1 kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 55 41 54 53 89 f3 48 83 ec 30 48 89 55 b0 <44> 0f b6 b7 b4 00 00 00 4c 8b a7 88 06 00 00 65 48 8b 05 12 7c 77
Jul 03 15:00:42 pve1 kernel: RSP: 0018:ffffd15f40e7fd28 EFLAGS: 00010286
Jul 03 15:00:42 pve1 kernel: RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000000001
Jul 03 15:00:42 pve1 kernel: RDX: ffffffffc1eae660 RSI: 0000000000000000 RDI: 0000000000000000
Jul 03 15:00:42 pve1 kernel: RBP: ffffd15f40e7fd80 R08: ffffffffc204a376 R09: 0000000000000000
Jul 03 15:00:42 pve1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jul 03 15:00:42 pve1 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
Jul 03 15:00:42 pve1 kernel: FS:  0000000000000000(0000) GS:ffff8aa58d88e000(0000) knlGS:0000000000000000
Jul 03 15:00:42 pve1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 03 15:00:42 pve1 kernel: CR2: 00000000000000b4 CR3: 000000000363e003 CR4: 0000000000f72ef0
Jul 03 15:00:42 pve1 kernel: PKRU: 55555554
Jul 03 15:00:42 pve1 kernel: note: kworker/u80:7[313] exited with irqs disabled
Jul 03 15:00:42 pve1 kernel: amdgpu 0000:04:00.0:  ttm finalized

Code:
proxmox-ve: 9.2.0 (running kernel: 7.0.12-1-pve)
pve-manager: 9.2.4 (running version: 9.2.4/5e5ae681198514d4)
proxmox-kernel-helper: 9.2.0
proxmox-kernel-7.0: 7.0.14-2
proxmox-kernel-7.0.14-2-pve-signed: 7.0.14-2
proxmox-kernel-7.0.12-1-pve-signed: 7.0.12-1
ceph-fuse: 19.2.3-pve2
corosync: 3.1.10-pve2
criu: 4.1.1-1
frr-pythontools: 10.6.1-1+pve2
ifupdown2: 3.3.0-1+pmx12
intel-microcode: 3.20251111.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.1
libproxmox-backup-qemu0: 2.0.2
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.1.1
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.1.6
libpve-cluster-perl: 9.1.6
libpve-common-perl: 9.1.16
libpve-guest-common-perl: 6.0.4
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.6.6
libpve-notify-perl: 9.1.6
libpve-rs-perl: 0.15.3
libpve-storage-perl: 9.1.6
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 7.0.0-2
lxcfs: 7.0.0-pve1
novnc-pve: 1.7.0-2
openvswitch-switch: 3.5.0-1+b1
proxmox-backup-client: 4.2.2-1
proxmox-backup-file-restore: 4.2.2-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.3
proxmox-kernel-helper: 9.2.0
proxmox-mail-forward: 1.0.3
proxmox-mini-journalreader: 1.7
proxmox-offline-mirror-helper: 0.7.4
proxmox-widget-toolkit: 5.2.6
pve-cluster: 9.1.6
pve-container: 6.1.10
pve-docs: 9.2.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.18-4
pve-ha-manager: 5.2.4
pve-i18n: 3.9.0
pve-qemu-kvm: 11.0.0-4
pve-xtermjs: 6.0.0-2
qemu-server: 9.1.18
smartmontools: 7.5-pve2
spiceterm: 3.4.2
swtpm: 0.8.0+pve3
vncterm: 1.9.2
zfsutils-linux: 2.4.3-pve1
 
Last edited:
> dm_ism_commit_idle_optimization_state+0xee/0x250 [amdgpu]
> amdgpu_dm_ism_commit_event+0x104/0x770 [amdgpu]
> dm_ism_sso_delayed_work_func+0x40/0x60 [amdgpu]

Could this be a regression caused by the following change?

https://launchpad.net/ubuntu/+source/linux/7.0.0-28.28

Code:
  * Patchset for TUXEDO devices (LP: #2152570)
    - drm/i915/vbt: Add edp pipe joiner enable/disable bits
    - drm/i915/dp: Avoid joiner for eDP if not enabled in VBT
    - drm/amd/display: Add Idle state manager(ISM) ★
    - drm/i915/backlight: Remove try_vesa_interface
    - drm/i915/backlight: Use intel_panel variable instead of intel_connector
    - drm/i915/backlight: Take luminance_set into account for VESA backlight
    - drm/i915/backlight: Check luminance_set when disabling PWM via AUX VESA
      backlight
    - drm/i915/backlight: Short circuit intel_dp_aux_supports_hdr_backlight
    - drm/i915/backlight: Update debug log during backlight setup
    - drm/i915/backlight: Check if VESA backlight is possible
    - drm/i915/backlight: Provide clear description on how backlight level is
      controlled
    - drm/i915/backlight: Fix VESA backlight possible check condition

https://git.kernel.org/pub/scm/linu.../?id=754003486c3cc95f2fcb9d6b71a779047d6db95c

Code:
-rw-r--r--    drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_ism.c    598  
-rw-r--r--    drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_ism.h    151

Since there’s no longer any mention of the Ubuntu version, I don’t know what this is based on. I don’t know if the base version is indeed 7.0.0-28-28, or if this is caused by the bug mentioned above. I’m really at a loss.

https://github.com/proxmox/pve-kernel/commit/1a31da55736261db46c285885f33aa7c5d3ffa21

Code:
proxmox-kernel-7.0 (7.0.14-1) trixie; urgency=medium

  * update submodule and patches for Proxmox-7.0.14-1.

 -- Proxmox Support Team <support@proxmox.com>  Wed, 01 Jul 2026 18:53:19 +0200
 
Last edited:
"revert AMD display Idle State Manager to fix passthrough hang": [1] in proxmox-kernel 7.0.14-3: [2]. Not yet released.

[1] https://git.proxmox.com/?p=pve-kernel.git;a=commit;h=d741df4e1fc9c5fcaf4353275e2542eaef0a46f3
[2] https://git.proxmox.com/?p=pve-kern...;hpb=d741df4e1fc9c5fcaf4353275e2542eaef0a46f3
Wanted to reply here now with that info, but you were quicker - thanks a lot!
Thank you so much!!! I'm really happy to hear it's going to be fixed.
Thank you for your report; as the newer version really only has just these two new commits reverted, we moved it out faster so that it now already caught up on all repos the previous 7.0.14-1 was available.

I then re-checked a bit more closely today and found the actual follow-up fix, and replaced the revert with that [0] but did not re-bump again just for this change; AFAICT the AMD ISM doesn't really matter for the passthrough case anyway, which is the most common one for using AMD GPUs in PVE.

[0]: https://git.proxmox.com/?p=pve-kernel.git;a=commitdiff;h=1ab4d13477e4416ce64581b4836f3d4d21f07e93
 
Thank you.
After applying the update, I confirmed that the virtual machine with the AMD RX 9070XT passed through starts up without any issues.
 
  • Like
Reactions: tom