[SOLVED] kvm: vfio: Cannot reset device 0000:0d:00.0, depends on group 30 which is not owned.

Bookmaster

Member
May 25, 2022
16
0
6
Hello,
I am running Proxmox 7.2.3. on an HP ProLiant ML350e Gen8 a bit modified Server. I have a couple of VM, and most of them are windows vm. On one of them I have Windows 10 machine with a PCI GPU passthrough (nVidia Quadro M2000) that worked fine until this morning. Now I am not able to boot Windows 10 anymore. I think the problem started a week ago when I create Cron that startup that VM with Windows 10. The first morning I started to receive msg:

kvm: vfio: Cannot reset device 0000:0d:00.0, depends on group 30 which is not owned.

kvm: vfio: Cannot reset device 0000:0d:00.0, depends on group 30 which is not owned.

Which I ignored. Then I received msg from an attached file which I also ignored. I ignored those messages because VM Windows 10 worked normally. After that msg I decided to remove start VM from Cron. Every morning I started manually that VM with Windows 10 and it worked fine.
This morning I encountered a problem, VM won't boot at all. It just goes into repair mode and It is not possible to repair it. I am sure that the problem is with GPU Passthrough but I don't know how to repair it.

I have:
nano /etc/default/grub

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset vide>
GRUB_CMDLINE_LINUX=""

Also:
nano /etc/modules

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Also:
lspci -v

0d:00.0 VGA compatible controller: NVIDIA Corporation GM206GL [Quadro M2000] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Dell GM206GL [Quadro M2000]
Physical Slot: 2
Flags: bus master, fast devsel, latency 0, IRQ 110, NUMA node 0, IOMMU group 29
Memory at f7000000 (32-bit, non-prefetchable) [size=16M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at de000000 (64-bit, prefetchable) [size=32M]
I/O ports at 8000
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau

0d:00.1 Audio device: NVIDIA Corporation GM206 High Definition Audio Controller (rev a1)
Subsystem: Dell GM206 High Definition Audio Controller
Physical Slot: 2
Flags: bus master, fast devsel, latency 0, IRQ 93, NUMA node 0, IOMMU group 30
Memory at f6ff0000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel


What else I can provide to receive help, please?

Thanks in advance


BM
 

Attachments

Last edited:
kvm: vfio: Cannot reset device 0000:0d:00.0, depends on group 30 which is not owned.
I'm getting a similar message for an on-board audio device, and it also works fine.
Which I ignored. Then I received msg from an attached file which I also ignored. I ignored those messages because VM Windows 10 worked normally. After that msg I decided to remove start VM from Cron. Every morning I started manually that VM with Windows 10 and it worked fine.
This morning I encountered a problem, VM won't boot at all. It just goes into repair mode and It is not possible to repair it. I am sure that the problem is with GPU Passthrough but I don't know how to repair it.
What changed? Did you update your Proxmox or your VM? This is not something that just happens with PCI(e) passthrough, but Proxmox kernel updates and updates inside the VM can break a working passthrough.
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset vide>
Unfortunately, the line is not complete. Please use cat /proc/cmdline to show us the currently active kernel parameters.

I'm guessing here that you are also using something like video=efifb:off and that you updated to kernel version 5.15, where those work-arounds no longer work. And also that you are doing passthrough of the GPU that is used during boot of the Proxmox system. Replace nofb nomodeset video=... with this work-around.

Please don't use pcie_acs_override (because that make the system lie about the IOMMU groups) and show us the IOMMU groups using this command: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done.
What else I can provide to receive help, please?
The VM configuration file via qm config VMID (I don't know the VMID number of your VM).
And information about what devices you are passing through and whether the GPU is used during POST/boot of the Proxmox system.
 
I'm getting a similar message for an on-board audio device, and it also works fine.
I didn't receive that kind of msg before.
What changed? Did you update your Proxmox or your VM? This is not something that just happens with PCI(e) passthrough, but Proxmox kernel updates and updates inside the VM can break a working passthrough.
I didn't update Proxmox. My colleague who is using that VM clicked on the regular Windows update and shutdown. I am not sure if that causes a problem.
Unfortunately, the line is not complete. Please use cat /proc/cmdline to show us the currently active kernel parameters.
here is:

BOOT_IMAGE=/boot/vmlinuz-5.15.30-2-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off

I'm guessing here that you are also using something like video=efifb:off and that you updated to kernel version 5.15, where those work-arounds no longer work. And also that you are doing passthrough of the GPU that is used during boot of the Proxmox system. Replace nofb nomodeset video=... with this work-around.
Yes, I am using a "tutorial" from reddit: GPU passtrough
Please don't use pcie_acs_override (because that make the system lie about the IOMMU groups) and show us the IOMMU groups using this command: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done.
I will try to edit it. I am not so familiar with Proxmox but I will try.
The VM configuration file via qm config VMID (I don't know the VMID number of your VM).
And information about what devices you are passing through and whether the GPU is used during POST/boot of the Proxmox system.
number of my VM is 950:

root@pve:~# qm config 950
agent: 1
bios: ovmf
boot: order=scsi0
cores: 4
cpu: host
description: Ma%C5%A1ina za grafiku.%0AZa sada se startuje samo po potrebi.%0AU perspektivi namestiti da se startuje automatski tokom radnog vremena.
efidisk0: vmdata1:vm-950-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:0d:00.0,pcie=1
ide0: usb:iso/virtio-win-0.1.225.iso,media=cdrom,size=519590K
machine: pc-q35-6.2
memory: 8192
meta: creation-qemu=6.2.0,ctime=1661333113
name: Win10-Grafika
net0: e1000=DA:CA:1E:27:9B:41,bridge=vmbr1,firewall=1
numa: 1
ostype: win10
scsi0: vmdata1:vm-950-disk-1,size=170G
scsihw: virtio-scsi-single
smbios1: uuid=4596bc1a-2f8d-42fe-99e8-77d2c285dedc
sockets: 2
vmgenid: 2eeeade0-4fe3-4300-a07c-713744b85eaa
root@pve:~#
 
BOOT_IMAGE=/boot/vmlinuz-5.15.30-2-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:eek:ff,efifb:eek:ff
You are indeed using kernel parameters that no longer work since kernel version 5.15 and some are even invalid nowadays.
The pcie_acs_override will invalidate the IOMMU group information. Either remove it or don't bother looking at the IOMMU groups.
My advise is to replace nofb nomodeset video=vesafb:off,efifb:off with initcall_blacklist=sysfb_init. Your guide is out of date
root@pve:~# qm config 950
agent: 1
bios: ovmf
boot: order=scsi0
cores: 4
cpu: host
description: Ma%C5%A1ina za grafiku.%0AZa sada se startuje samo po potrebi.%0AU perspektivi namestiti da se startuje automatski tokom radnog vremena.
efidisk0: vmdata1:vm-950-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:0d:00.0,pcie=1
Looks like you are passing through only the VGA part of a GPU. Usually you passthrough all functions of the device. This can be done by enabling All Functions in the Proxmox GUI for that PCIe device.
ide0: usb:iso/virtio-win-0.1.225.iso,media=cdrom,size=519590K
machine: pc-q35-6.2
memory: 8192
meta: creation-qemu=6.2.0,ctime=1661333113
name: Win10-Grafika
net0: e1000=DA:CA:1E:27:9B:41,bridge=vmbr1,firewall=1
numa: 1
ostype: win10
scsi0: vmdata1:vm-950-disk-1,size=170G
scsihw: virtio-scsi-single
smbios1: uuid=4596bc1a-2f8d-42fe-99e8-77d2c285dedc
sockets: 2
vmgenid: 2eeeade0-4fe3-4300-a07c-713744b85eaa
It's not clear to me what GPU you are passing through and whether this GPU is used during boot of the system. But for the work-around I advised above, it does not matter.

I'm almost sure that just using the new work-around (initcall_blacklist=sysfb_init) for kernel version 5.15 (and newer) will fix your passthrough issues.
 
Thank you very much for your support and time.
I think that I have some ghosts in my Proxomox. My two other VM still on startup and shutdown manageable by Cron are shut down and suddenly my VM with Windows 10 booted. I made a backup, just in case, and then did that (initcall_blacklist=sysfb_init), as you suggested. It seems that everything works nicely now.
Thank you very much once again.
Just one question at the end: do I have to reboot Proxmox when I made changes (initcall_blacklist=sysfb_init), please?

Regards

BM
 
Just one question at the end: do I have to reboot Proxmox when I made changes (initcall_blacklist=sysfb_init), please?
Changes to the kernel parameters always need a Proxmox reboot to become active (and you need to apply then with proxmox-boot-tool or update-grub). You can check the current kernel parameters with cat /proc/cmdline.
 
Changes to the kernel parameters always need a Proxmox reboot to become active (and you need to apply then with proxmox-boot-tool or update-grub). You can check the current kernel parameters with cat /proc/cmdline.

Thanks.

I will mark this thread as solved
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!