Host crash every day on i915 [drm] GPU HANG: ecode 9:1:85dffffa (gpu passthrough)

migo

Member
Oct 31, 2021
4
0
6
43
Hi,

I followed few forum posts and wiki entries to pass from my host to windows 10 based gest intel integrated graphics card to be used as quick sync encoder, since then almost every day host crashes in a way that is not even accessible by network.

Most of the time last entires in log before crash are:
Code:
Oct 29 03:15:34 pve kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
Oct 29 03:15:34 pve kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Oct 29 03:15:34 pve kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffa

Grub:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt i915.enable_gvt=1 i915.alpha_support=1 drm.debug=0 i915.enable_guc=0 video=efifb:off,vesafb:off"

modules:
Code:
root@pve:~# cat /etc/modules

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
kvmgt
exngt
vfio-mdev

Code:
root@pve:~# cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1

Host version:
Code:
root@pve:~# uname -a
Linux pve 5.11.22-4-pve #1 SMP PVE 5.11.22-8 (Fri, 27 Aug 2021 11:51:34 +0200) x86_64 GNU/Linux

Windows 10 guest has latests Intel drivers and HW acceleration works on guest. Did any one had similar issues and resolved them?
 
hi,

so you are passing through your intel card to your windows guest?

but why is the i915 not blacklisted on your host?

if you want to passthrough the GPU to your VM, then ideally your host shouldn't be messing with it.

you can try adding it into /etc/modprobe.d/pve-blacklist.conf file and then reboot (note that your host won't load the graphics driver after reboot)
 
hi,

so you are passing through your intel card to your windows guest?

but why is the i915 not blacklisted on your host?

if you want to passthrough the GPU to your VM, then ideally your host shouldn't be messing with it.

you can try adding it into /etc/modprobe.d/pve-blacklist.conf file and then reboot (note that your host won't load the graphics driver after reboot)

It is blacklisted, but its still loaded as module by kvmgt, I don't know how to passthrough i915 to be used exclusive by guest instead using kvmgt platform

Code:
root@pve:~# lsmod | grep i915
i915                 2306048  2 kvmgt
drm_kms_helper        245760  1 i915
cec                    53248  2 drm_kms_helper,i915
i2c_algo_bit           16384  1 i915
drm                   548864  4 drm_kms_helper,kvmgt,i915
video                  53248  2 asus_wmi,i915
 
Last edited:
It is blacklisted, but its still loaded as module by kvmgt, I don't know how to passthrough i915 to be used exclusive by guest instead using kvmgt platform
oh i see...

can you also show your VM config and /etc/modprobe.d/vfio.conf files?
 
ok so your suggestion @oguz about exposing intel graphics directly go guest was correct, I just removed kvmgt and exngt modules, removed new intel drivers from windows guest and it works, no i915 module loaded on host anymore - hope that will solve crashes (or at least will not crash host box anymore)
 
I just removed kvmgt and exngt modules, removed new intel drivers from windows guest and it works, no i915 module loaded on host anymore - hope that will solve crashes (or at least will not crash host box anymore)
do let us know if it solves the issue.
the reason i asked for VM config and vfio config is that you can do some custom args which can help as well (even if still i915 loaded on host it could work), if you get a crash again please post those
 
@oguz this is my current VM config:

Code:
root@pve:~# cat /etc/pve/nodes/pve/qemu-server/101.conf
agent: 1,fstrim_cloned_disks=1
audio0: device=ich9-intel-hda,driver=none
bios: ovmf
boot: order=scsi0;sata3
cores: 6
cpu: host
efidisk0: local-lvm:vm-101-disk-1,size=4M
hostpci0: 0000:00:02,pcie=1,x-vga=1
machine: pc-q35-6.0
memory: 16384
name: windows-blueiris
net0: virtio=5A:58:BB:76:20:C8,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win10
sata3: none,media=cdrom
scsi0: local-lvm:vm-101-disk-0,cache=writeback,discard=on,size=170G
scsi1: local-storage-lvm:vm-101-disk-0,cache=writeback,discard=on,size=3700G
scsihw: virtio-scsi-pci
smbios1: uuid=efb1d1fe-f507-4a9f-aefd-9645c2d9cfba
sockets: 1
vmgenid: 7d57320d-b414-409d-b425-3681fa104a55

and vfio related settings:

Code:
root@pve:~# lspci -nn | grep 530
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:1912] (rev 06)
root@pve:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2504,10de:228e,1ac1:089a,8086:1912 disable_vga=1

should I tune it more with some additional parameters?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!