Today I updated my Homelab Cluster to PVE 9, all seems to work fine, with one major exception. I have VM which runs my NAS and has PCIe-Passthrough configured for the SATA-Controller and the Intel iGPU (via https://github.com/strongtz/i915-sriov-dkms)
Because of the i915-sriov-dkms I had my Kernel pinned to 6.8.12-13-pve, which worked fine after the update. Once I realized this, I unpinned the Kernel, and rebootet with the "real" PVE 9 Kernel 6.14.8-2-pve.
Booted with the 6.14, my NAS-VM no longer starts, as long as either of the PCIe Devices (SATA-Controller or iGPU) is attached. The start command seems to work, but then the VM just hangs. Console doesn't work (timeout) trying to run any monitor command also produces a timeout, VM also never reaches a state where it is reachable over the network. If I remove all PCIe Passhtrough from the VM, it boots again.
I don't see any obvious errors in the logs of the node, other than the timeouts like this when I try to do anything with the "starting" VM:
I then re-pinned the kernel to 6.8.12 and the VM worked again after a reboot of the node, so thats a workaround for now. But I still would like to find out what the problem is when running on 6.14.
At first I suspected the i915-sriov drivers, but it also doesn't work with just the SATA-Controller passed through. Also according to dmesg / journal the i915-sriov driver loads just fine on 6.14 as well.
Attached are the journalctl logs for both kernels, maybe someone sees anything in there
Hardware is a Odroid H4 Ultra, 64G RAM and dual NVME Adapter with Samsung 980 1TB SSDs.
Here is the pveversion -v output of the node:
Also the config of the affected VM:
And corresponding device mappings:
If anyone has an Idea what might be the problem, I'd appreciate any hints. Also If you need additional logs or info or know of ways to get more debug output from the VM start, let me know.
Because of the i915-sriov-dkms I had my Kernel pinned to 6.8.12-13-pve, which worked fine after the update. Once I realized this, I unpinned the Kernel, and rebootet with the "real" PVE 9 Kernel 6.14.8-2-pve.
Booted with the 6.14, my NAS-VM no longer starts, as long as either of the PCIe Devices (SATA-Controller or iGPU) is attached. The start command seems to work, but then the VM just hangs. Console doesn't work (timeout) trying to run any monitor command also produces a timeout, VM also never reaches a state where it is reachable over the network. If I remove all PCIe Passhtrough from the VM, it boots again.
I don't see any obvious errors in the logs of the node, other than the timeouts like this when I try to do anything with the "starting" VM:
Code:
VM 200 qmp command 'human-monitor-command' failed - unable to connect to VM 200 qmp socket - timeout after 242 retries
I then re-pinned the kernel to 6.8.12 and the VM worked again after a reboot of the node, so thats a workaround for now. But I still would like to find out what the problem is when running on 6.14.
At first I suspected the i915-sriov drivers, but it also doesn't work with just the SATA-Controller passed through. Also according to dmesg / journal the i915-sriov driver loads just fine on 6.14 as well.
Attached are the journalctl logs for both kernels, maybe someone sees anything in there
Hardware is a Odroid H4 Ultra, 64G RAM and dual NVME Adapter with Samsung 980 1TB SSDs.
Here is the pveversion -v output of the node:
Code:
proxmox-ve: 9.0.0 (running kernel: 6.8.12-13-pve)
pve-manager: 9.0.3 (running version: 9.0.3/025864202ebb6109)
proxmox-kernel-helper: 9.0.3
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.14: 6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx9
intel-microcode: 3.20250512.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.9
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.6
libpve-rs-perl: 0.10.7
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2
lxc-pve: 6.0.4-2
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.11-1
proxmox-backup-file-restore: 4.0.11-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.1
proxmox-kernel-helper: 9.0.3
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.0
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.9
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-3
pve-ha-manager: 5.0.4
pve-i18n: 3.5.2
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.16
smartmontools: 7.4-pve1
spiceterm: 3.4.0
swtpm: 0.8.0+pve2
vncterm: 1.9.0
zfsutils-linux: 2.3.3-pve1
Also the config of the affected VM:
Code:
agent: 1
balloon: 16384
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 6
cpu: host
efidisk0: vmdisks:vm-200-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: mapping=sata-controller,pcie=1
hostpci1: mapping=intel-igpu-1,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 24576
meta: creation-qemu=9.2.0,ctime=1752392553
name: nas
net0: virtio=BC:24:11:XX:XX:XX,bridge=lan
net1: virtio=BC:24:11:XX:XX:XX,bridge=vmbr0,tag=14
numa: 0
onboot: 1
ostype: l26
protection: 1
scsi0: vmdisks:vm-200-disk-1,discard=on,iothread=1,size=32G,ssd=1
scsi1: vmdisks:vm-200-disk-2,discard=on,iothread=1,size=64G,ssd=1
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=07df3dbf-d245-4c58-8a24-3a9d087acb51
sockets: 1
startup: order=1
usb0: mapping=tvcard
usb1: mapping=audiocard
vga: std
vmgenid: c9e5fb0f-369c-4ed1-9a0d-35cce3ed800a
And corresponding device mappings:
Code:
pvesh get /cluster/mapping/pci
┌─────────────┬─────────────────┬──────────────────────────────────────────────────────────────────────────────────────┬────────┐
│ description │ id │ map │ checks │
╞═════════════╪═════════════════╪══════════════════════════════════════════════════════════════════════════════════════╪════════╡
│ │ intel-igpu-6 │ ["id=8086:46d0,iommugroup=23,node=pve01-c,path=0000:00:02.6,subsystem-id=8086:2212"] │ │
├─────────────┼─────────────────┼──────────────────────────────────────────────────────────────────────────────────────┼────────┤
│ │ intel-igpu-4 │ ["id=8086:46d0,iommugroup=21,node=pve01-c,path=0000:00:02.4,subsystem-id=8086:2212"] │ │
├─────────────┼─────────────────┼──────────────────────────────────────────────────────────────────────────────────────┼────────┤
│ │ intel-igpu-2 │ ["id=8086:46d0,iommugroup=19,node=pve01-c,path=0000:00:02.2,subsystem-id=8086:2212"] │ │
├─────────────┼─────────────────┼──────────────────────────────────────────────────────────────────────────────────────┼────────┤
│ │ intel-igpu-5 │ ["id=8086:46d0,iommugroup=22,node=pve01-c,path=0000:00:02.5,subsystem-id=8086:2212"] │ │
├─────────────┼─────────────────┼──────────────────────────────────────────────────────────────────────────────────────┼────────┤
│ │ intel-igpu-7 │ ["id=8086:46d0,iommugroup=24,node=pve01-c,path=0000:00:02.7,subsystem-id=8086:2212"] │ │
├─────────────┼─────────────────┼──────────────────────────────────────────────────────────────────────────────────────┼────────┤
│ │ intel-igpu-1 │ ["id=8086:46d0,iommugroup=18,node=pve01-c,path=0000:00:02.1,subsystem-id=8086:2212"] │ │
├─────────────┼─────────────────┼──────────────────────────────────────────────────────────────────────────────────────┼────────┤
│ │ sata-controller │ ["id=1b21:1064,iommugroup=15,node=pve01-c,path=0000:03:00.0,subsystem-id=1b21:2116"] │ │
├─────────────┼─────────────────┼──────────────────────────────────────────────────────────────────────────────────────┼────────┤
│ │ intel-igpu-3 │ ["id=8086:46d0,iommugroup=20,node=pve01-c,path=0000:00:02.3,subsystem-id=8086:2212"] │ │
└─────────────┴─────────────────┴──────────────────────────────────────────────────────────────────────────────────────┴────────┘
If anyone has an Idea what might be the problem, I'd appreciate any hints. Also If you need additional logs or info or know of ways to get more debug output from the VM start, let me know.