"Irq 17: nobody cared" problem with PCI passthrough

jic5760

Active Member
Nov 10, 2020
41
8
28
27
Version:
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-libc-dev: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.10.7+ds1-0+deb10u1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

Before start the VM:
Code:
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 (prog-if 02 [NVM Express])
        Interrupt: pin A routed to IRQ 17
        NUMA node: 0
        Kernel driver in use: nvme

07:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 520] (rev a1) (prog-if 00 [VGA controller])
        Interrupt: pin A routed to IRQ 11
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

07:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)
        Interrupt: pin B routed to IRQ 10
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel



The IRQ of the 02:00 device and the IRQ of the 07:00 device are different.


However, when starting VM, the VM dies with an error(in the host) as shown below

Code:
[  918.137315] irq 17: nobody cared (try booting with the "irqpoll" option)
[  918.137339] CPU: 2 PID: 1979 Comm: kvm Tainted: P           O      5.4.73-1-pve #1
[  918.137340] Hardware name: Supermicro X10SLL+-F/X10SLL+-F, BIOS 3.3 03/28/2020
[  918.137340] Call Trace:
[  918.137342]  <IRQ>
[  918.137347]  dump_stack+0x6d/0x9a
[  918.137350]  __report_bad_irq+0x3c/0xb6
[  918.137352]  note_interrupt.cold.10+0xb/0x5d
[  918.137353]  handle_irq_event_percpu+0x6f/0x80
[  918.137354]  handle_irq_ev


At this time, the IRQ of the passthrouth device is changed.
Code:
07:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 520] (rev a1) (prog-if 00 [VGA controller])
        Interrupt: pin A routed to IRQ 16
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

07:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)
        Interrupt: pin B routed to IRQ 17
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

The 07.00.1 device's IRQ is changed to 17, which collision the existing 02:00 device.
Before starting the VM, the IRQ was unique.
 
Last edited: