Bug? GPU passthrough not working on vmlinuz-5.3.10-1-pve (v 6.1)

spicyisland

New Member
Dec 4, 2019
21
1
3
26
My pc setup
=====================================
host gpu: (nvidia 2070 super)
guest gpu: (radeon rx 580)
cpu: Ryzen 7 3700X
=====================================

My config
=====================================
args: -cpu host,hv_vendor_id=1234567890ab,kvm=off -machine type=kernel_irqchip=on
bios: ovmf
bootdisk: scsi0
cores: 8
cpu: host,hidden=1
efidisk0: local-lvm:vm-101-disk-1,size=4M
hostpci0: 09:00,pcie=1,x-vga=1
machine: q35
memory: 16384
name: win10-spicyisland
net0: virtio=16:2D:28:30:CC:F4,bridge=vmbr0,firewall=1
numa: 1
ostype: win10
scsi0: local-lvm:vm-101-disk-0,iothread=1,size=200G
scsihw: virtio-scsi-single
smbios1: uuid=46047da5-0061-42c6-b42a-9cb2000d1642
sockets: 1
usb0: host=413c:2113
usb1: host=18f8:0fc0
usb2: host=0c76:161e
vga: none
vmgenid: 5fde0088-7bb6-4a89-bfe8-75be362a47a4
=====================================

If I boot with vmlinuz-5.0.21-5-pve passthrough works fine.
 
Last edited:
What error message are you seeing? Anything in logs?

Also, your 'args:' line is not needed, all settings specified there are automatically applied by PVE (and the kernel_irqchip workaround is no longer needed with recent pve-qemu-kvm versions).
 
Thanks! I removed that line and it worked! (also cpu hidden=1)
I struggled and followed some old articles...

Sorry I didn't look carefully but there were error messages in dmesg.

[ 572.327056] xhci_hcd 0000:09:00.2: remove, state 4
[ 572.327059] usb usb6: USB disconnect, device number 1
[ 572.327461] xhci_hcd 0000:09:00.2: USB bus 6 deregistered
[ 572.327465] xhci_hcd 0000:09:00.2: remove, state 4
[ 572.327466] usb usb5: USB disconnect, device number 1
[ 572.328089] xhci_hcd 0000:09:00.2: USB bus 5 deregistered
[ 572.569714] BUG: unable to handle page fault for address: ffffba9200131000
[ 572.571928] #PF: supervisor read access in kernel mode
[ 572.572897] #PF: error_code(0x0000) - not-present page
[ 572.573860] PGD 81b154067 P4D 81b154067 PUD 81b155067 PMD 81b156067 PTE 0
[ 572.574833] Oops: 0000 [#1] SMP NOPTI
[ 572.575798] CPU: 3 PID: 112 Comm: kworker/3:2 Tainted: P O 5.3.10-1-pve #1
[ 572.576763] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F10 11/15/2019
[ 572.577708] Workqueue: events ccg_pm_workaround_work [ucsi_ccg]
[ 572.578660] RIP: 0010:gpu_i2c_check_status.isra.5+0x52/0xa0 [i2c_nvidia_gpu]
[ 572.579606] Code: 25 00 00 00 60 3d 00 00 00 60 75 24 be 58 02 00 00 bf f4 01 00 00 e8 cd 16 9a e7 48 8b 05 c6 1c 4e e8 4c 39 e0 79 09 49 8b 06 <8b> 18 85 db 78 ce 48 8b 05 b1 1c 4e e8 49 39 c4 0f 88 b6 02 00 00
[ 572.580591] RSP: 0018:ffffba920058bcc0 EFLAGS: 00010283
[ 572.581571] RAX: ffffba9200131000 RBX: 00000000ffffffff RCX: 0000000000000000
[ 572.582553] RDX: ffff9cee5e6ddb40 RSI: 0000000000000000 RDI: ffffba920058bbf8
[ 572.583527] RBP: ffffba920058bce0 R08: 0000000000000000 R09: 0000000000000003
[ 572.584494] R10: 000000000000000e R11: ffff9cee5e6e94c4 R12: 0000000100010af6
[ 572.585454] R13: ffff9cee5abe5818 R14: ffff9cee5abe5820 R15: 00000001000109fc
[ 572.586408] FS: 0000000000000000(0000) GS:ffff9cee5e6c0000(0000) knlGS:0000000000000000
[ 572.587358] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 572.588300] CR2: ffffba9200131000 CR3: 00000008034f6000 CR4: 0000000000340ee0
[ 572.589243] Call Trace:
[ 572.590204] gpu_i2c_master_xfer+0xe8/0x22d [i2c_nvidia_gpu]
[ 572.591152] __i2c_transfer+0x180/0x4d0
[ 572.592087] i2c_transfer+0x88/0x100
[ 572.593011] ccg_read+0x11e/0x170 [ucsi_ccg]
[ 572.593940] ? __switch_to_asm+0x40/0x70
[ 572.594862] ? __switch_to_asm+0x40/0x70
[ 572.595772] ucsi_ccg_sync+0x56/0xb0 [ucsi_ccg]
[ 572.596681] ucsi_notify+0x26/0x120 [typec_ucsi]
[ 572.597591] ccg_pm_workaround_work+0x15/0x20 [ucsi_ccg]
[ 572.598507] process_one_work+0x20f/0x3d0
[ 572.599412] worker_thread+0x34/0x400
[ 572.600313] kthread+0x120/0x140
[ 572.601213] ? process_one_work+0x3d0/0x3d0
[ 572.602111] ? __kthread_parkme+0x70/0x70
[ 572.603004] ret_from_fork+0x22/0x40
[ 572.603894] Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter bonding softdog nfnetlink_log nfnetlink edac_mce_amd kvm_amd kvm zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic amdgpu ledtrig_audio aesni_intel snd_hda_codec_hdmi amd_iommu_v2 aes_x86_64 gpu_sched crypto_simd ttm cryptd glue_helper snd_hda_intel snd_usb_audio drm_kms_helper wmi_bmof snd_usbmidi_lib k10temp ucsi_ccg snd_hda_codec snd_rawmidi xpad snd_hda_core snd_seq_device drm typec_ucsi ff_memless mc snd_hwdep typec fb_sys_fops joydev input_leds syscopyarea snd_pcm ccp zcommon(PO) sysfillrect snd_timer sysimgblt snd znvpair(PO) soundcore spl(O) vhost_net vhost tap ib_iser mac_hid rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio sunrpc ip_tables x_tables autofs4 btrfs xor
[ 572.603918] zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c usbmouse hid_generic usbkbd usbhid hid uas usb_storage i2c_piix4 i2c_nvidia_gpu ahci libahci igb i2c_algo_bit dca wmi
[ 572.610024] CR2: ffffba9200131000
[ 572.611095] ---[ end trace 2e2ef1650bea31c1 ]---
[ 572.612159] RIP: 0010:gpu_i2c_check_status.isra.5+0x52/0xa0 [i2c_nvidia_gpu]
[ 572.613220] Code: 25 00 00 00 60 3d 00 00 00 60 75 24 be 58 02 00 00 bf f4 01 00 00 e8 cd 16 9a e7 48 8b 05 c6 1c 4e e8 4c 39 e0 79 09 49 8b 06 <8b> 18 85 db 78 ce 48 8b 05 b1 1c 4e e8 49 39 c4 0f 88 b6 02 00 00
[ 572.614320] RSP: 0018:ffffba920058bcc0 EFLAGS: 00010283
[ 572.615416] RAX: ffffba9200131000 RBX: 00000000ffffffff RCX: 0000000000000000
[ 572.616520] RDX: ffff9cee5e6ddb40 RSI: 0000000000000000 RDI: ffffba920058bbf8
[ 572.617617] RBP: ffffba920058bce0 R08: 0000000000000000 R09: 0000000000000003
[ 572.618704] R10: 000000000000000e R11: ffff9cee5e6e94c4 R12: 0000000100010af6
[ 572.619790] R13: ffff9cee5abe5818 R14: ffff9cee5abe5820 R15: 00000001000109fc
[ 572.620865] FS: 0000000000000000(0000) GS:ffff9cee5e6c0000(0000) knlGS:0000000000000000
[ 572.621943] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 572.623013] CR2: ffffba9200131000 CR3: 00000008034f6000 CR4: 0000000000340ee0
[ 572.829227] device tap100i0 entered promiscuous mode
[ 572.843851] fwbr100i0: port 1(fwln100i0) entered blocking state
[ 572.844973] fwbr100i0: port 1(fwln100i0) entered disabled state
[ 572.846125] device fwln100i0 entered promiscuous mode
[ 572.847253] fwbr100i0: port 1(fwln100i0) entered blocking state
[ 572.848350] fwbr100i0: port 1(fwln100i0) entered forwarding state
[ 572.851107] vmbr0: port 2(fwpr100p0) entered blocking state
[ 572.852165] vmbr0: port 2(fwpr100p0) entered disabled state
[ 572.853265] device fwpr100p0 entered promiscuous mode
[ 572.854339] vmbr0: port 2(fwpr100p0) entered blocking state
[ 572.855369] vmbr0: port 2(fwpr100p0) entered forwarding state
[ 572.858186] fwbr100i0: port 2(tap100i0) entered blocking state
[ 572.859207] fwbr100i0: port 2(tap100i0) entered disabled state
[ 572.860250] fwbr100i0: port 2(tap100i0) entered blocking state
[ 572.861235] fwbr100i0: port 2(tap100i0) entered forwarding state
[ 573.214290] vfio-pci 0000:09:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[ 573.215519] vfio-pci 0000:09:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
 
Last edited:
[ 572.569714] BUG: unable to handle page fault for address: ffffba9200131000
[ 572.571928] #PF: supervisor read access in kernel mode
[ 572.572897] #PF: error_code(0x0000) - not-present page
[ 572.573860] PGD 81b154067 P4D 81b154067 PUD 81b155067 PMD 81b156067 PTE 0
[ 572.574833] Oops: 0000 [#1] SMP NOPTI
[ 572.575798] CPU: 3 PID: 112 Comm: kworker/3:2 Tainted: P O 5.3.10-1-pve #1
[ 572.576763] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F10 11/15/2019
[ 572.577708] Workqueue: events ccg_pm_workaround_work [ucsi_ccg]
[ 572.578660] RIP: 0010:gpu_i2c_check_status.isra.5+0x52/0xa0 [i2c_nvidia_gpu]
[ 572.579606] Code: 25 00 00 00 60 3d 00 00 00 60 75 24 be 58 02 00 00 bf f4 01 00 00 e8 cd 16 9a e7 48 8b 05 c6 1c 4e e8 4c 39 e0 79 09 49 8b 06 <8b> 18 85 db 78 ce 48 8b 05 b1 1c 4e e8 49 39 c4 0f 88 b6 02 00 00
[ 572.580591] RSP: 0018:ffffba920058bcc0 EFLAGS: 00010283
[ 572.581571] RAX: ffffba9200131000 RBX: 00000000ffffffff RCX: 0000000000000000
[ 572.582553] RDX: ffff9cee5e6ddb40 RSI: 0000000000000000 RDI: ffffba920058bbf8
[ 572.583527] RBP: ffffba920058bce0 R08: 0000000000000000 R09: 0000000000000003
[ 572.584494] R10: 000000000000000e R11: ffff9cee5e6e94c4 R12: 0000000100010af6
[ 572.585454] R13: ffff9cee5abe5818 R14: ffff9cee5abe5820 R15: 00000001000109fc
[ 572.586408] FS: 0000000000000000(0000) GS:ffff9cee5e6c0000(0000) knlGS:0000000000000000
[ 572.587358] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 572.588300] CR2: ffffba9200131000 CR3: 00000008034f6000 CR4: 0000000000340ee0
[ 572.589243] Call Trace:
[ 572.590204] gpu_i2c_master_xfer+0xe8/0x22d [i2c_nvidia_gpu]
[ 572.591152] __i2c_transfer+0x180/0x4d0
[ 572.592087] i2c_transfer+0x88/0x100
Haven't dug too deep into it, but it seems this is coming from the NVIDIA driver? (specifically the I2C interface)

I'd say as long as your GPU works and you're not experiencing instability it should probably be fine, but it is ugly of course. Does this error appear often? Can you trigger it somehow?

I can't understand what is going on but I tried memtest86 but just after I selected memtest86 in grub the screen will black out.

It's unlikely that bad RAM would cause this error, but still weird that memtest is not working for you... What disk setup are you using? (ZFS, ext4, RAID, etc...) Do you know if you're UEFI or BIOS (legacy) booting?
 
Haven't dug too deep into it, but it seems this is coming from the NVIDIA driver? (specifically the I2C interface)

I'd say as long as your GPU works and you're not experiencing instability it should probably be fine, but it is ugly of course. Does this error appear often? Can you trigger it somehow?



It's unlikely that bad RAM would cause this error, but still weird that memtest is not working for you... What disk setup are you using? (ZFS, ext4, RAID, etc...) Do you know if you're UEFI or BIOS (legacy) booting?

Thank you for your response!
I'm glad to have nice service!

This error always appear and I can't boot my vm if I use 5.3 kernel.
I found another one facing the same issue.

https://forum.proxmox.com/threads/u...erve-the-old-working-kernel-on-updates.61492/

Actually I'm not sure about my disk setup. I just have one ssd on my pc and didn't changed any settings though.
Both host and guest are UEFI booting.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!