Proxmox 9.0 Beta - kernel issues with vfio-pci on Mellanox 100G.

dominiaz

Renowned Member
Sep 16, 2016
50
11
73
38
Kernel is broken with Mellanox 100G Connectx-5 VF on Proxmox 9.0 Beta. That card works fine only on Host without VF, so vfio-pci is broken in that release I think.

kvm: -device vfio-pci,host=0000:81:00.1,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio 0000:81:00.1: error getting device from group 89: Permission denied
Verify all devices in group 89 are bound to vfio-<bus> or pci-stub and not already in use

Code:
agent: 1
balloon: 0
boot: order=virtio0;ide2;net0
cores: 20
cpu: host
hostpci0: 0000:81:00.1,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 32768
meta: creation-qemu=9.2.0,ctime=1752404131
name: debian13rc2
net0: virtio=BC:24:11:80:0C:69,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsihw: virtio-scsi-single
smbios1: uuid=cbfa1f66-f75a-48f8-83ed-f014ba8b1089
sockets: 1
virtio0: local-zfs:vm-20100-disk-0,cache=directsync,discard=on,iothread=1,size=200G
virtio1: xiraid2:20100/vm-20100-disk-0.raw,aio=native,cache=directsync,iothread=1,size=32G
vmgenid: e0daa015-4aa1-49e4-81ac-f4217fb4d28e

Mellanox Connectx-5 100G
Code:
echo 8 | sudo tee /sys/class/net/ens3np0/device/sriov_numvfs

Code:
lspci | grep Mellanox
81:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
81:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
81:00.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
81:00.3 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
81:00.4 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
81:00.5 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
81:00.6 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
81:00.7 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
81:01.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]

Code:
# journalctl -b 0 | grep -i iommu
Jul 20 22:51:12 s2 kernel: iommu: Default domain type: Translated
Jul 20 22:51:12 s2 kernel: iommu: DMA domain TLB invalidation policy: lazy mode
Jul 20 22:51:12 s2 kernel: pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
Jul 20 22:51:12 s2 kernel: pci 0000:c0:01.0: Adding to iommu group 0
Jul 20 22:51:12 s2 kernel: pci 0000:c0:01.1: Adding to iommu group 1
Jul 20 22:51:12 s2 kernel: pci 0000:c0:01.2: Adding to iommu group 2
Jul 20 22:51:12 s2 kernel: pci 0000:c0:01.3: Adding to iommu group 3
Jul 20 22:51:12 s2 kernel: pci 0000:c0:01.4: Adding to iommu group 4
Jul 20 22:51:12 s2 kernel: pci 0000:c0:02.0: Adding to iommu group 5
Jul 20 22:51:12 s2 kernel: pci 0000:c0:03.0: Adding to iommu group 6
Jul 20 22:51:12 s2 kernel: pci 0000:c0:04.0: Adding to iommu group 7
Jul 20 22:51:12 s2 kernel: pci 0000:c0:05.0: Adding to iommu group 8
Jul 20 22:51:12 s2 kernel: pci 0000:c0:05.2: Adding to iommu group 8
Jul 20 22:51:12 s2 kernel: pci 0000:c0:07.0: Adding to iommu group 9
Jul 20 22:51:12 s2 kernel: pci 0000:c0:07.1: Adding to iommu group 10
Jul 20 22:51:12 s2 kernel: pci 0000:c0:08.0: Adding to iommu group 11
Jul 20 22:51:12 s2 kernel: pci 0000:c0:08.1: Adding to iommu group 12
Jul 20 22:51:12 s2 kernel: pci 0000:c1:00.0: Adding to iommu group 13
Jul 20 22:51:12 s2 kernel: pci 0000:c2:00.0: Adding to iommu group 14
Jul 20 22:51:12 s2 kernel: pci 0000:c3:00.0: Adding to iommu group 15
Jul 20 22:51:12 s2 kernel: pci 0000:c4:00.0: Adding to iommu group 16
Jul 20 22:51:12 s2 kernel: pci 0000:c5:00.0: Adding to iommu group 8
Jul 20 22:51:12 s2 kernel: pci 0000:c6:00.0: Adding to iommu group 8
Jul 20 22:51:12 s2 kernel: pci 0000:c7:00.0: Adding to iommu group 17
Jul 20 22:51:12 s2 kernel: pci 0000:c7:00.2: Adding to iommu group 18
Jul 20 22:51:12 s2 kernel: pci 0000:c8:00.0: Adding to iommu group 19
Jul 20 22:51:12 s2 kernel: pci 0000:c8:00.2: Adding to iommu group 20
Jul 20 22:51:12 s2 kernel: pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
Jul 20 22:51:12 s2 kernel: pci 0000:80:01.0: Adding to iommu group 21
Jul 20 22:51:12 s2 kernel: pci 0000:80:01.1: Adding to iommu group 22
Jul 20 22:51:12 s2 kernel: pci 0000:80:02.0: Adding to iommu group 23
Jul 20 22:51:12 s2 kernel: pci 0000:80:03.0: Adding to iommu group 24
Jul 20 22:51:12 s2 kernel: pci 0000:80:03.1: Adding to iommu group 24
Jul 20 22:51:12 s2 kernel: pci 0000:80:03.2: Adding to iommu group 24
Jul 20 22:51:12 s2 kernel: pci 0000:80:03.3: Adding to iommu group 25
Jul 20 22:51:12 s2 kernel: pci 0000:80:03.4: Adding to iommu group 26
Jul 20 22:51:12 s2 kernel: pci 0000:80:04.0: Adding to iommu group 27
Jul 20 22:51:12 s2 kernel: pci 0000:80:05.0: Adding to iommu group 28
Jul 20 22:51:12 s2 kernel: pci 0000:80:07.0: Adding to iommu group 29
Jul 20 22:51:12 s2 kernel: pci 0000:80:07.1: Adding to iommu group 30
Jul 20 22:51:12 s2 kernel: pci 0000:80:08.0: Adding to iommu group 31
Jul 20 22:51:12 s2 kernel: pci 0000:80:08.1: Adding to iommu group 32
Jul 20 22:51:12 s2 kernel: pci 0000:81:00.0: Adding to iommu group 33
Jul 20 22:51:12 s2 kernel: pci 0000:84:00.0: Adding to iommu group 34
Jul 20 22:51:12 s2 kernel: pci 0000:85:00.0: Adding to iommu group 35
Jul 20 22:51:12 s2 kernel: pci 0000:86:00.0: Adding to iommu group 36
Jul 20 22:51:12 s2 kernel: pci 0000:86:00.2: Adding to iommu group 37
Jul 20 22:51:12 s2 kernel: pci 0000:87:00.0: Adding to iommu group 38
Jul 20 22:51:12 s2 kernel: pci 0000:87:00.2: Adding to iommu group 39
Jul 20 22:51:12 s2 kernel: pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
Jul 20 22:51:12 s2 kernel: pci 0000:40:01.0: Adding to iommu group 40
Jul 20 22:51:12 s2 kernel: pci 0000:40:01.3: Adding to iommu group 41
Jul 20 22:51:12 s2 kernel: pci 0000:40:01.4: Adding to iommu group 40
Jul 20 22:51:12 s2 kernel: pci 0000:40:02.0: Adding to iommu group 42
Jul 20 22:51:12 s2 kernel: pci 0000:40:03.0: Adding to iommu group 43
Jul 20 22:51:12 s2 kernel: pci 0000:40:03.1: Adding to iommu group 44
Jul 20 22:51:12 s2 kernel: pci 0000:40:04.0: Adding to iommu group 45
Jul 20 22:51:12 s2 kernel: pci 0000:40:05.0: Adding to iommu group 46
Jul 20 22:51:12 s2 kernel: pci 0000:40:07.0: Adding to iommu group 47
Jul 20 22:51:12 s2 kernel: pci 0000:40:07.1: Adding to iommu group 48
Jul 20 22:51:12 s2 kernel: pci 0000:40:08.0: Adding to iommu group 49
Jul 20 22:51:12 s2 kernel: pci 0000:40:08.1: Adding to iommu group 50
Jul 20 22:51:12 s2 kernel: pci 0000:40:08.2: Adding to iommu group 51
Jul 20 22:51:12 s2 kernel: pci 0000:40:08.3: Adding to iommu group 52
Jul 20 22:51:12 s2 kernel: pci 0000:41:00.0: Adding to iommu group 53
Jul 20 22:51:12 s2 kernel: pci 0000:43:00.0: Adding to iommu group 54
Jul 20 22:51:12 s2 kernel: pci 0000:44:00.0: Adding to iommu group 55
Jul 20 22:51:12 s2 kernel: pci 0000:44:00.2: Adding to iommu group 56
Jul 20 22:51:12 s2 kernel: pci 0000:45:00.0: Adding to iommu group 57
Jul 20 22:51:12 s2 kernel: pci 0000:45:00.1: Adding to iommu group 58
Jul 20 22:51:12 s2 kernel: pci 0000:45:00.2: Adding to iommu group 59
Jul 20 22:51:12 s2 kernel: pci 0000:45:00.3: Adding to iommu group 60
Jul 20 22:51:12 s2 kernel: pci 0000:46:00.0: Adding to iommu group 61
Jul 20 22:51:12 s2 kernel: pci 0000:47:00.0: Adding to iommu group 62
Jul 20 22:51:12 s2 kernel: pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
Jul 20 22:51:12 s2 kernel: pci 0000:00:00.0: Adding to iommu group 63
Jul 20 22:51:12 s2 kernel: pci 0000:00:01.0: Adding to iommu group 64
Jul 20 22:51:12 s2 kernel: pci 0000:00:01.1: Adding to iommu group 65
Jul 20 22:51:12 s2 kernel: pci 0000:00:02.0: Adding to iommu group 66
Jul 20 22:51:12 s2 kernel: pci 0000:00:03.0: Adding to iommu group 67
Jul 20 22:51:12 s2 kernel: pci 0000:00:03.1: Adding to iommu group 67
Jul 20 22:51:12 s2 kernel: pci 0000:00:03.2: Adding to iommu group 67
Jul 20 22:51:12 s2 kernel: pci 0000:00:03.3: Adding to iommu group 68
Jul 20 22:51:12 s2 kernel: pci 0000:00:03.4: Adding to iommu group 69
Jul 20 22:51:12 s2 kernel: pci 0000:00:04.0: Adding to iommu group 70
Jul 20 22:51:12 s2 kernel: pci 0000:00:05.0: Adding to iommu group 71
Jul 20 22:51:12 s2 kernel: pci 0000:00:07.0: Adding to iommu group 72
Jul 20 22:51:12 s2 kernel: pci 0000:00:07.1: Adding to iommu group 73
Jul 20 22:51:12 s2 kernel: pci 0000:00:08.0: Adding to iommu group 74
Jul 20 22:51:12 s2 kernel: pci 0000:00:08.1: Adding to iommu group 75
Jul 20 22:51:12 s2 kernel: pci 0000:00:14.0: Adding to iommu group 76
Jul 20 22:51:12 s2 kernel: pci 0000:00:14.3: Adding to iommu group 76
Jul 20 22:51:12 s2 kernel: pci 0000:00:18.0: Adding to iommu group 77
Jul 20 22:51:12 s2 kernel: pci 0000:00:18.1: Adding to iommu group 77
Jul 20 22:51:12 s2 kernel: pci 0000:00:18.2: Adding to iommu group 77
Jul 20 22:51:12 s2 kernel: pci 0000:00:18.3: Adding to iommu group 77
Jul 20 22:51:12 s2 kernel: pci 0000:00:18.4: Adding to iommu group 77
Jul 20 22:51:12 s2 kernel: pci 0000:00:18.5: Adding to iommu group 77
Jul 20 22:51:12 s2 kernel: pci 0000:00:18.6: Adding to iommu group 77
Jul 20 22:51:12 s2 kernel: pci 0000:00:18.7: Adding to iommu group 77
Jul 20 22:51:12 s2 kernel: pci 0000:01:00.0: Adding to iommu group 78
Jul 20 22:51:12 s2 kernel: pci 0000:01:00.1: Adding to iommu group 79
Jul 20 22:51:12 s2 kernel: pci 0000:01:00.2: Adding to iommu group 80
Jul 20 22:51:12 s2 kernel: pci 0000:01:00.3: Adding to iommu group 81
Jul 20 22:51:12 s2 kernel: pci 0000:05:00.0: Adding to iommu group 82
Jul 20 22:51:12 s2 kernel: pci 0000:06:00.0: Adding to iommu group 83
Jul 20 22:51:12 s2 kernel: pci 0000:07:00.0: Adding to iommu group 84
Jul 20 22:51:12 s2 kernel: pci 0000:07:00.2: Adding to iommu group 85
Jul 20 22:51:12 s2 kernel: pci 0000:08:00.0: Adding to iommu group 86
Jul 20 22:51:12 s2 kernel: pci 0000:08:00.2: Adding to iommu group 87
Jul 20 22:51:12 s2 kernel: pci 0000:08:00.3: Adding to iommu group 88
Jul 20 22:51:12 s2 kernel: perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
Jul 20 22:51:12 s2 kernel: perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
Jul 20 22:51:12 s2 kernel: perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
Jul 20 22:51:12 s2 kernel: perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).
Jul 20 22:51:17 s2 kernel: pci 0000:81:00.1: Adding to iommu group 89
Jul 20 22:51:17 s2 kernel: pci 0000:81:00.2: Adding to iommu group 90
Jul 20 22:51:17 s2 kernel: pci 0000:81:00.3: Adding to iommu group 91
Jul 20 22:51:18 s2 kernel: pci 0000:81:00.4: Adding to iommu group 92
Jul 20 22:51:18 s2 kernel: pci 0000:81:00.5: Adding to iommu group 93
Jul 20 22:51:18 s2 kernel: pci 0000:81:00.6: Adding to iommu group 94
Jul 20 22:51:19 s2 kernel: pci 0000:81:00.7: Adding to iommu group 95
Jul 20 22:51:19 s2 kernel: pci 0000:81:01.0: Adding to iommu group 96

Original kernel from Proxmox 9.0 Beta (kernel 6.14.8-1-pve) is only broken and that error appear when I am trying to redirect VF of Connectx-5.
Redirecting of Connectx-5 (whole pcie device) works fine.

Everything works fine on Proxmox 9.0 Beta (kernel 6.16.0-6-pve) proxmox-kernel-6.16.0-6-pve_6.16.0-6_amd64.deb (https://github.com/KrzysztofHajdamowicz/pve-kernel/releases)
Everything works fine on Proxmox 8.4 (kernel 6.8.12-12-pve)

Please help @dcsapak
 
Last edited:
I have the same problem as you. Hope to get an answer..
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module is not loaded
PCI devices:
------------
DEVICE_TYPE MST PCI RDMA NET NUMA
BlueField2(rev:1) NA 01:00.0 mlx5_0 net-enp1s0f0np0 3
1753051896824.png

And I only work normally when I fix the kernel version 6.14.5-1-bpo12-pve. I don't know what to do now.


c0b017c9-2ed0-4a7a-aad3-24979b20f532.png
 
I saw the issues , but the current problem seems to be unsolvable. must wait for the kernel to be updated or downgrade the current kernel.
I don't know if the next update will fix it.
T.T
 
Oh, by the way, this problem only affects KVM. LXC using phys passthrough is not affected by this problem. It can be used normally.
 
it seems like this,downgrade the kernel or wait for update
I tried to solve this problem for half a day. It was very strange.
The physical pass-through of LXC was normal, but KVM was not. During this period,
I changed the hardware configuration. I thought it was caused by my bios.I moved the server for half a day to verify it. This is really a sad story.
T.T...