I have a Mellanox ConnectX LX configured with 8 VFs, some of which are passed through to my VMs. This worked without issues so far, until today.
I added a Radeon RX 6600 XT GPU, with intention to pass it through a Windows 11 VM. The first issue noticed is that the Mellanox devices were reassigned to a different 05:00.0 ID from the previous 02:00:0. The VMs with the passed through VFs didn't work and I got a flood of
Now, I checked the /proc/iomem and didn't see any conflicts:
Additionally, I am getting Error 43 in Windows 11, where the GPU is passed through. No other errors whatsoever in dmesg when the VM is initialized.
I can get the dmesg flood stop by blacklisting amdgpu driver, even though it is supposed to be enabled for passing through, at least according some posts I saw here on the forums (e.g. https://forum.proxmox.com/threads/problem-with-gpu-passthrough.55918/post-483749).
Now, I found in some other sources (e.g. https://forum.level1techs.com/t/am5...x-cpu-2-kvm-conflicting-memory-types/201555/4) that it is recommended to disable Mellanox's VF autoprobing in case of memory type conflicts, which I did, but the result is the VMs will no longer start, unable to bind to PCI node. When issuing the kvm command directly:
I am seeing:
Nowhere did I see anyone mentioning a requirement to additionally configure vfio-pci before passing to a VM when autoprobing is disabled. It's supposed to just work. Irregardless, after comparing the dmesg between autoprobing enabled and disabled and can see that in the former dmesg, enabling those VFs for VFIO-PCI is actually causing the conflicting memory type issues:
This makes me conclude that disabling autoprobing wouldn't really help here, since it would fail as soon as I had those VFs initialized for VFIO by hand (via its mobile parameter) — let me remind you here that this works just fine without amdgpu module and the latter seems to be the actual source of conflict (although I still don't understand why disabling autoprobing stops VMs from using VFs).
I would appreciate any help here, I am pulling hair.
So extra info:
SysFS:
/etc/kernel/cmdline:
I added a Radeon RX 6600 XT GPU, with intention to pass it through a Windows 11 VM. The first issue noticed is that the Mellanox devices were reassigned to a different 05:00.0 ID from the previous 02:00:0. The VMs with the passed through VFs didn't work and I got a flood of
x86/PAT: kvm:21286 conflicting memory types 401c900000-401ca00000 uncached-minus<->write-combining
messages in dmesg, with some other issues on top:
Code:
[ 756.298607] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 756.320050] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 756.320314] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[ 756.320451] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b2f00 (59.47.0)
[ 756.320725] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[ 756.329366] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
[ 756.330774] [drm] DMUB hardware initialized: version=0x02020020
[ 756.352468] [drm] kiq ring mec 2 pipe 1 q 0
[ 756.356247] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[ 756.356554] [drm] JPEG decode initialized successfully.
[ 756.356697] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 756.356827] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 756.356954] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 756.357078] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 756.357201] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 756.357322] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 756.357438] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 756.357553] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 756.357667] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 756.357776] amdgpu 0000:03:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[ 756.357882] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 756.357988] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 756.358089] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
[ 756.358191] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
[ 756.358293] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
[ 756.358392] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
[ 756.361537] amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
[ 756.376259] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
[ 756.495649] [drm] amdgpu: ttm finalized
[ 776.684228] vfio-pci 0000:05:01.3: enabling device (0000 -> 0002)
[ 776.800538] x86/PAT: kvm:21286 conflicting memory types 401c900000-401ca00000 uncached-minus<->write-combining
[ 776.800734] x86/PAT: memtype_reserve failed [mem 0x401c900000-0x401c9fffff], track uncached-minus, req uncached-minus
[ 776.800928] ioremap memtype_reserve failed -16
[ 777.066986] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[ 777.307035] sd 1:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 777.390947] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 777.631008] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 778.307553] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 778.307671] CPU: 11 PID: 0 Comm: swapper/11 Tainted: P O 6.5.11-8-pve #1
[ 778.307943] Hardware name: HP HP Z2 Tower G9 Workstation Desktop PC/895C, BIOS U50 Ver. 02.04.02 11/06/2023
[ 778.308234] Call Trace:
[ 778.308383] <IRQ>
[ 778.308533] dump_stack_lvl+0x48/0x70
[ 778.308685] dump_stack+0x10/0x20
[ 778.308832] __report_bad_irq+0x30/0xd0
[ 778.308979] note_interrupt+0x2e1/0x320
[ 778.309126] handle_irq_event+0x79/0x80
[ 778.309271] handle_fasteoi_irq+0x7d/0x200
[ 778.309418] __common_interrupt+0x43/0xd0
[ 778.309565] common_interrupt+0x9f/0xb0
[ 778.309707] </IRQ>
[ 778.309846] <TASK>
[ 778.309983] asm_common_interrupt+0x27/0x40
[ 778.310119] RIP: 0010:cpuidle_enter_state+0xce/0x470
[ 778.310255] Code: 28 10 ff e8 64 f6 ff ff 8b 53 04 49 89 c6 0f 1f 44 00 00 31 ff e8 22 25 0f ff 80 7d d7 00 0f 85 e7 01 00 00 fb 0f 1f 44 00 00 <45> 85 ff 0f 88 83 01 00 00 49 63 d7 4c 89 f1 48 8d 04 52 48 8d 04
[ 778.310660] RSP: 0018:ffffc18a8018fe50 EFLAGS: 00000246
[ 778.310801] RAX: 0000000000000000 RBX: ffffe18a7fcc0930 RCX: 0000000000000000
[ 778.310947] RDX: 000000000000000b RSI: 0000000000000000 RDI: 0000000000000000
[ 778.311088] RBP: ffffc18a8018fe88 R08: 0000000000000000 R09: 0000000000000000
[ 778.311225] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[ 778.311379] R13: ffffffff99e690e0 R14: 000000b536bec4f4 R15: 0000000000000004
[ 778.311528] cpuidle_enter+0x2e/0x50
[ 778.311658] call_cpuidle+0x23/0x60
[ 778.311784] do_idle+0x202/0x260
[ 778.311905] cpu_startup_entry+0x2a/0x30
[ 778.312021] start_secondary+0x119/0x140
[ 778.312133] secondary_startup_64_no_verify+0x17e/0x18b
[ 778.312243] </TASK>
[ 778.312349] handlers:
[ 778.312458] [<00000000a84d1531>] vfio_intx_handler [vfio_pci_core]
[ 778.312580] Disabling IRQ #16
Now, I checked the /proc/iomem and didn't see any conflicts:
Code:
00000000-00000fff : Reserved
00001000-0009efff : System RAM
0009f000-000fffff : Reserved
000a0000-000bffff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-4783a017 : System RAM
4783a018-4784e257 : System RAM
4784e258-56fc7fff : System RAM
56fc8000-5703dfff : Reserved
5703e000-621dafff : System RAM
621db000-65d5dfff : Reserved
65d5e000-65f5dfff : ACPI Non-volatile Storage
65f5e000-65ffefff : ACPI Tables
65fff000-65ffffff : System RAM
66000000-69ffffff : Reserved
6b200000-6b3fffff : Reserved
6bc00000-807fffff : Reserved
80800000-bfffffff : PCI Bus 0000:00
80a00000-80cfffff : PCI Bus 0000:01
80a00000-80bfffff : PCI Bus 0000:02
80a00000-80bfffff : PCI Bus 0000:03
80a00000-80afffff : 0000:03:00.0
80b00000-80b03fff : 0000:03:00.1
80b00000-80b03fff : ICH HD audio
80b20000-80b3ffff : 0000:03:00.0
80c00000-80c03fff : 0000:01:00.0
80d00000-80d00fff : 0000:00:1f.5
80e00000-80e1ffff : 0000:00:1f.6
80e00000-80e1ffff : vfio-pci
80e20000-80e21fff : 0000:00:17.0
80e20000-80e21fff : vfio-pci
80e23000-80e237ff : 0000:00:17.0
80e23000-80e237ff : vfio-pci
80e23800-80e23fff : vfio sub-page reserved
80e24000-80e240ff : 0000:00:17.0
80e24000-80e240ff : vfio-pci
80e24100-80e24fff : vfio sub-page reserved
80f00000-810fffff : PCI Bus 0000:05
80f00000-80ffffff : 0000:05:00.0
81000000-810fffff : 0000:05:00.1
81100000-817fffff : PCI Bus 0000:04
81100000-813fffff : 0000:04:00.0
81400000-816fffff : 0000:04:00.0
81700000-81703fff : 0000:04:00.0
81700000-81703fff : nvme
81704000-8170ffff : 0000:04:00.0
81800000-81efffff : PCI Bus 0000:06
81800000-81afffff : 0000:06:00.0
81b00000-81dfffff : 0000:06:00.0
81e00000-81e03fff : 0000:06:00.0
81e00000-81e03fff : nvme
81e04000-81e0ffff : 0000:06:00.0
c0000000-cfffffff : PCI MMCONFIG 0000 [bus 00-ff]
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed00000-fed003ff : PNP0103:00
fed20000-fed7ffff : Reserved
fed40000-fed44fff : IFX1521:00
fed40000-fed44fff : IFX1521:00 IFX1521:00
fed90000-fed90fff : dmar0
fed91000-fed91fff : dmar1
feda0000-feda0fff : pnp 00:05
feda1000-feda1fff : pnp 00:05
fedb0000-fedbffff : pnp 00:04
fedc0000-fedc7fff : pnp 00:05
fee00000-feefffff : pnp 00:05
fee00000-fee00fff : Local APIC
100000000-87f7fffff : System RAM
3fe600000-3ff9fffff : Kernel code
3ffa00000-400674fff : Kernel rodata
400800000-400b7febf : Kernel data
40102f000-4021fffff : Kernel bss
87f800000-87fffffff : RAM buffer
4000000000-7fffffffff : PCI Bus 0000:00
4000000000-400fffffff : 0000:00:02.0
4010000000-4016ffffff : 0000:00:02.0
4018000000-401dffffff : PCI Bus 0000:05
4018000000-4019ffffff : 0000:05:00.0
4018000000-4019ffffff : mlx5_core
401a000000-401bffffff : 0000:05:00.1
401a000000-401bffffff : mlx5_core
401c000000-401c7fffff : 0000:05:00.0
401c800000-401cffffff : 0000:05:00.1
401c800000-401c8fffff : 0000:05:01.2
401c900000-401c9fffff : 0000:05:01.3
401ca00000-401cafffff : 0000:05:01.4
401ca00000-401cafffff : mlx5_core
401cb00000-401cbfffff : 0000:05:01.5
401cb00000-401cbfffff : mlx5_core
401cc00000-401ccfffff : 0000:05:01.6
401cc00000-401ccfffff : mlx5_core
401cd00000-401cdfffff : 0000:05:01.7
401cd00000-401cdfffff : mlx5_core
401ce00000-401cefffff : 0000:05:02.0
401ce00000-401cefffff : mlx5_core
401cf00000-401cffffff : 0000:05:02.1
401cf00000-401cffffff : mlx5_core
4020000000-40ffffffff : 0000:00:02.0
6000000000-620fffffff : PCI Bus 0000:01
6000000000-620fffffff : PCI Bus 0000:02
6000000000-620fffffff : PCI Bus 0000:03
6000000000-61ffffffff : 0000:03:00.0
6200000000-620fffffff : 0000:03:00.0
6214000000-6214ffffff : 0000:00:02.0
6215000000-621500ffff : 0000:00:14.0
6215000000-621500ffff : xhci-hcd
6215010000-6215013fff : 0000:00:14.2
6215014000-62150140ff : 0000:00:1f.4
6215016000-6215016fff : 0000:00:14.2
Additionally, I am getting Error 43 in Windows 11, where the GPU is passed through. No other errors whatsoever in dmesg when the VM is initialized.
I can get the dmesg flood stop by blacklisting amdgpu driver, even though it is supposed to be enabled for passing through, at least according some posts I saw here on the forums (e.g. https://forum.proxmox.com/threads/problem-with-gpu-passthrough.55918/post-483749).
Now, I found in some other sources (e.g. https://forum.level1techs.com/t/am5...x-cpu-2-kvm-conflicting-memory-types/201555/4) that it is recommended to disable Mellanox's VF autoprobing in case of memory type conflicts, which I did, but the result is the VMs will no longer start, unable to bind to PCI node. When issuing the kvm command directly:
/usr/bin/kvm -id 127 -name 'TrueNAS,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/127.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/127.pid -daemonize -smbios 'type=1,uuid=744c0128-708d-4c5c-b6a1-93afe96aaf05' -smp '16,sockets=1,cores=16,maxcpus=16' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/127.vnc,password=on' -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt -m 6144 -object 'iothread,id=iothread-virtio0' -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=079b0904-e321-49aa-9c55-c338a4881fb0' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:00:17.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'vfio-pci,host=0000:05:01.3,id=hostpci1,bus=pci.0,addr=0x11' -device 'usb-host,vendorid=0x152d,productid=0x1561,id=usb0' -device 'usb-host,vendorid=0x7825,productid=0xa2a4,id=usb1' -device 'VGA,id=vga,bus=pcie.0,addr=0x1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:5c218c373c6d' -drive 'file=/var/lib/vz/images/127/vm-127-disk-0.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0,bootindex=100' -machine 'type=q35+pve0'
I am seeing:
kvm: -device vfio-pci,host=0000:05:01.3,id=hostpci1,bus=pci.0,addr=0x11: vfio 0000:05:01.3: failed to open /dev/vfio/18: No such file or directory
Nowhere did I see anyone mentioning a requirement to additionally configure vfio-pci before passing to a VM when autoprobing is disabled. It's supposed to just work. Irregardless, after comparing the dmesg between autoprobing enabled and disabled and can see that in the former dmesg, enabling those VFs for VFIO-PCI is actually causing the conflicting memory type issues:
[ 20.199492] vfio-pci 0000:05:01.2: enabling device (0000 -> 0002)
[ 20.315487] x86/PAT: kvm:1981 conflicting memory types 401c800000-401c900000 uncached-minus<->write-combining
[ 20.315504] x86/PAT: memtype_reserve failed [mem 0x401c8
This makes me conclude that disabling autoprobing wouldn't really help here, since it would fail as soon as I had those VFs initialized for VFIO by hand (via its mobile parameter) — let me remind you here that this works just fine without amdgpu module and the latter seems to be the actual source of conflict (although I still don't understand why disabling autoprobing stops VMs from using VFs).
I would appreciate any help here, I am pulling hair.
So extra info:
SysFS:
class/net/ens4f1np1/device/sriov_drivers_autoprobe = 0
class/net/ens4f1np1/device/sriov_numvfs = 8
/etc/kernel/cmdline:
root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt pci=assign-busses vfio-pci.ids=1002:1478,1002:1479 initcall_blacklist=sysfb_init disable_vga=1 module_blacklist=amdgpu
Last edited: