Hi,
After some troubleshooting it seems that kernel 6.17.9-1-pve is causing me issues using pcie-passthrough. I had installed a riser for my gpu to free up slot space for usb card, both of which + an nvme are assigned to a vm. After starting the server back up, when the vm starts, it gets the yellow triangle and states internal error in the status page. This is the error from syslog:
I thought it was the riser or cables or the new usb card, but I tried without all of the risers and with/without usb card with no fix, but changing to i440fx fixed it, and downgrading the kernel to 6.17.4-2-pve fixed it for q35 too. I forgot that I had a pending kernel update when I started work on the server I guess.
This is on an Asrock Rome ROME2D16-2T, full iommu enabled etc, no problems with pcie passthrough until this now. I was also passing through an intel p4600 nvme to the vm but with or without this didnt change anything. It's not a big issue, downgrading kernel or just using i440fx doesn't matter to me, but its a bug nonetheless.
After some troubleshooting it seems that kernel 6.17.9-1-pve is causing me issues using pcie-passthrough. I had installed a riser for my gpu to free up slot space for usb card, both of which + an nvme are assigned to a vm. After starting the server back up, when the vm starts, it gets the yellow triangle and states internal error in the status page. This is the error from syslog:
Code:
Feb 19 12:18:30 epyc QEMU[53865]: KVM internal error. Suberror: 1
Feb 19 12:18:30 epyc QEMU[53865]: extra data[0]: 0x0000000000000000
Feb 19 12:18:30 epyc QEMU[53865]: extra data[1]: 0x0000000000000400
Feb 19 12:18:30 epyc QEMU[53865]: extra data[2]: 0x0000000100000014
Feb 19 12:18:30 epyc QEMU[53865]: extra data[3]: 0x0000000000030000
Feb 19 12:18:30 epyc QEMU[53865]: extra data[4]: 0x0000000000000000
Feb 19 12:18:30 epyc QEMU[53865]: extra data[5]: 0x0000000000000000
Feb 19 12:18:30 epyc QEMU[53865]: emulation failure
Feb 19 12:18:30 epyc QEMU[53865]: RAX=0000000000000000 RBX=000000007c2386e0 RCX=0000000000000000 RDX=000000007ee286b0
Feb 19 12:18:30 epyc QEMU[53865]: RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=000000007ee28668
Feb 19 12:18:30 epyc QEMU[53865]: R8 =000000007ee286b4 R9 =000000007ee286b8 R10=0000000000000000 R11=000000007ee28480
Feb 19 12:18:30 epyc QEMU[53865]: R12=0000000000000000 R13=000000007c23a800 R14=0000000000000002 R15=000000007c242812
Feb 19 12:18:30 epyc QEMU[53865]: RIP=0000000000030000 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
Feb 19 12:18:30 epyc QEMU[53865]: ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
Feb 19 12:18:30 epyc QEMU[53865]: CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
Feb 19 12:18:30 epyc QEMU[53865]: SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
Feb 19 12:18:30 epyc QEMU[53865]: DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
Feb 19 12:18:30 epyc QEMU[53865]: FS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
Feb 19 12:18:30 epyc QEMU[53865]: GS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
Feb 19 12:18:30 epyc QEMU[53865]: LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
Feb 19 12:18:30 epyc QEMU[53865]: TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
Feb 19 12:18:30 epyc QEMU[53865]: GDT= 000000007e9e0000 00000047
Feb 19 12:18:30 epyc QEMU[53865]: IDT= 000000007e305018 00000fff
Feb 19 12:18:30 epyc QEMU[53865]: CR0=80010033 CR2=0000000000000000 CR3=000000007ec01000 CR4=00000668
Feb 19 12:18:30 epyc QEMU[53865]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Feb 19 12:18:30 epyc QEMU[53865]: DR6=00000000ffff0ff0 DR7=0000000000000400
Feb 19 12:18:30 epyc QEMU[53865]: EFER=0000000000000d00
Feb 19 12:18:30 epyc QEMU[53865]: Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
I thought it was the riser or cables or the new usb card, but I tried without all of the risers and with/without usb card with no fix, but changing to i440fx fixed it, and downgrading the kernel to 6.17.4-2-pve fixed it for q35 too. I forgot that I had a pending kernel update when I started work on the server I guess.
Code:
root@epyc:~# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.17.4-2-pve)
pve-manager: 9.1.5 (running version: 9.1.5/80cf92a64bef6889)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.9-1-pve-signed: 6.17.9-1
proxmox-kernel-6.17: 6.17.9-1
proxmox-kernel-6.17.4-2-pve-signed: 6.17.4-2
proxmox-kernel-6.17.4-1-pve-signed: 6.17.4-1
proxmox-kernel-6.14.11-5-pve-signed: 6.14.11-5
proxmox-kernel-6.14: 6.14.11-5
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.8: 6.8.12-18
proxmox-kernel-6.8.12-18-pve-signed: 6.8.12-18
proxmox-kernel-6.8.12-17-pve-signed: 6.8.12-17
proxmox-kernel-6.8.12-15-pve-signed: 6.8.12-15
proxmox-kernel-6.8.12-14-pve-signed: 6.8.12-14
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.4.1-1+pve1
ifupdown2: 3.3.0-1+pmx12
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: not correctly installed
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.2
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.5
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.7
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.5
libpve-rs-perl: 0.11.4
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-4
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.2-1
proxmox-backup-file-restore: 4.1.2-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.5
pve-cluster: 9.0.7
pve-container: 6.1.1
pve-docs: 9.1.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.1.0
pve-i18n: 3.6.6
pve-qemu-kvm: 10.1.2-6
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.4
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1
This is on an Asrock Rome ROME2D16-2T, full iommu enabled etc, no problems with pcie passthrough until this now. I was also passing through an intel p4600 nvme to the vm but with or without this didnt change anything. It's not a big issue, downgrading kernel or just using i440fx doesn't matter to me, but its a bug nonetheless.