VM Internal Error

Tahsin

Well-Known Member
Mar 24, 2018
67
6
48
37
I have a Windows 11 VM after a while is showing internal-error in proxmox. The whole VM gets frozen and only way to recover is stop the VM and start it. Tried doing a fresh start and still having the same issue. Only thing that was changed was adding Intel Arc Pro B50 as SR-IOV pass-through. Previously, it was Nvidia. My other VMs using B50 does not have this issue.

In this current VM I am running BlueIris security camera application that uses both Quicksync and DirectML. Shows no errors on both in windows or proxmox logs. The VMs are running the latest virt-io drivers.

Configs below:

acpi: 1
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=scsi0;ide0
cores: 6
cpu: host
efidisk0: int_nvme:base-100-disk-0/vm-114-disk-0,efitype=4m,ms-cert=2023,pre-enrolled-keys=1,size=1M
hostpci0: mapping=B50,pcie=1
ide0: none,media=cdrom
kvm: 1
machine: pc-q35-10.1,viommu=intel
memory: 12288
meta: creation-qemu=10.1.2,ctime=1769300300
name: BlueIris
net0: virtio=02:0C:DE:B9:80:91,bridge=vmbr1,firewall=1,queues=2
net1: virtio=02:0C:DE:71:59:e0,bridge=vmbr2,firewall=1,queues=2
numa: 0
onboot: 1
ostype: win11
scsi0: int_nvme:base-100-disk-1/vm-114-disk-1,backup=0,discard=on,iothread=1,size=128G,ssd=1
scsi1: /dev/disk/by-id/ata-WDC_WD40PURZ-85TTDY0_WD-WCC7K3DCRCVK,backup=0,iothread=1,replicate=0,size=3907018584K
scsihw: virtio-scsi-single
smbios1: uuid=676b0c87-cebc-46a5-9597-1ba533409816
sockets: 1
tags: no_transfer
tpmstate0: int_nvme:vm-114-disk-2,size=4M,version=v2.0
vga: std
vmgenid: 9886c036-7394-4cd0-abd9-f270499f1c7

root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt nmi_watchdog=0

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

proxmox-ve: 9.1.0 (running kernel: 6.17.4-2-pve)
pve-manager: 9.1.4 (running version: 9.1.4/5ac30304265fbd8e)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.4-2-pve-signed: 6.17.4-2
proxmox-kernel-6.17: 6.17.4-2
proxmox-kernel-6.17.2-2-pve-signed: 6.17.2-2
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.11.11-2-pve-signed: 6.11.11-2
proxmox-kernel-6.11: 6.11.11-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20251111.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.5
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.4
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.4
libpve-rs-perl: 0.11.4
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.1-1
proxmox-backup-file-restore: 4.1.1-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.5
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.1.0
pve-i18n: 3.6.6
pve-qemu-kvm: 10.1.2-5
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.3
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

If there is any other thing needed let me know.
 

Attachments

Last edited:
This is what it shows after the VM becomes unresponsive.

Code:
Feb 01 12:31:22 pve QEMU[1683917]: error: kvm run failed Bad address
Feb 01 12:31:22 pve QEMU[1683917]: RAX=ffff82f14c229500 RBX=ffffb3867e10c000 RCX=ffffb3867e10c000 RDX=0000000000000000
Feb 01 12:31:22 pve QEMU[1683917]: RSI=0000000000000000 RDI=0000000000000001 RBP=ffffb3867e10c000 RSP=ffffd781396bbc50
Feb 01 12:31:22 pve QEMU[1683917]: R8 =0000000000000000 R9 =ffffd781396bbd30 R10=fffff806c8c0f800 R11=0000000000000001
Feb 01 12:31:22 pve QEMU[1683917]: R12=0000000000000001 R13=00000f7b426d4c62 R14=0000000000000000 R15=fffff8065ebf0000
Feb 01 12:31:22 pve QEMU[1683917]: RIP=fffff8065eecc05f RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
Feb 01 12:31:22 pve QEMU[1683917]: ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Feb 01 12:31:22 pve QEMU[1683917]: CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA]
Feb 01 12:31:22 pve QEMU[1683917]: SS =0018 0000000000000000 00000000 00409300 DPL=0 DS   [-WA]
Feb 01 12:31:22 pve QEMU[1683917]: DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Feb 01 12:31:22 pve QEMU[1683917]: FS =0053 0000000000000000 0000bc00 0040f300 DPL=3 DS   [-WA]
Feb 01 12:31:22 pve QEMU[1683917]: GS =002b ffffd7813969f000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Feb 01 12:31:22 pve QEMU[1683917]: LDT=0000 0000000000000000 00000000 00000000
Feb 01 12:31:22 pve QEMU[1683917]: TR =0040 ffffd781396af000 00000067 00008b00 DPL=0 TSS64-busy
Feb 01 12:31:22 pve QEMU[1683917]: GDT=     ffffd781396b0fb0 00000057
Feb 01 12:31:22 pve QEMU[1683917]: IDT=     ffffd781396ae000 00000fff
Feb 01 12:31:22 pve QEMU[1683917]: CR0=80050033 CR2=ffffe786d040f000 CR3=00000000001ae002 CR4=00370e78
Feb 01 12:31:22 pve QEMU[1683917]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Feb 01 12:31:22 pve QEMU[1683917]: DR6=00000000ffff0ff0 DR7=0000000000000400
Feb 01 12:31:22 pve QEMU[1683917]: EFER=0000000000000d01
Feb 01 12:31:22 pve QEMU[1683917]: Code=0f b6 d3 49 8b 84 00 78 11 00 00 48 03 c2 0f 84 9f 02 00 00 <80> 38 ff 0f 85 96 02 00 00 c6 00 00 48 8b 85 58 42 00 00 48 c1 e2 04 c7 44 24 30 00 00 00
 
This VM runs Blue Iris, which heavily uses the GPU all the time (video decoding, encoding, and AI processing). Other VMs using the same Intel GPU are fine because they don’t use the GPU as heavily or continuously.

What appears to be happening is:
  • The Intel GPU virtual function sometimes stops responding under heavy, constant load
  • When that happens, Proxmox loses access to the GPU and freezes the VM
  • No clear error shows up in logs because the GPU itself hangs
This did not happen with NVIDIA, likely because its virtualization support is more mature.

Possible workarounds:
  • Assign the entire GPU to this VM instead of using SR-IOV
  • Disable AI (DirectML) in Blue Iris and use only Intel QuickSync
  • Reduce GPU usage in Blue Iris
This looks like a current limitation or stability issue with Intel Arc SR-IOV, especially for 24/7 camera and AI workloads, rather than a Proxmox configuration problem.