Can you share the output of
pveversion -v
and
qm config <ID>
witht the IDs of the affected VMs? What host CPU do you have?
When a VM gets stuck again, can you run
strace -c -p $(cat /var/run/qemu-server/<ID>.pid)
with the ID of the stuck VM? Press Ctrl+C after a few seconds to get the output.
Did you also check the logs on the host?
Is this the only device you pass-through or are there others? Are the IOMMUs isolated enough for your use case:
https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_Isolation ?
Sharing output ----------
pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.104-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-1
pve-kernel-5.15.104-1-pve: 5.15.104-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
qm config 103
agent: 1
balloon: 1024
boot: order=scsi0;ide2;net0
cores: 2
ide2: local:iso/debian-11.6.0-amd64-netinst.iso,media=cdrom,size=388M
memory: 2048
meta: creation-qemu=7.1.0,ctime=1676657118
name: dock-first
net0: virtio=46:5D:0D:F2:FB:FE,bridge=vmbr0,firewall=1,tag=50
numa: 0
onboot: 1
ostype: l26
parent: fifo_only_and_carlos_hudba
scsi0: local-zfs:vm-103-disk-0,iothread=1,size=10G
scsi1: local-zfs:vm-103-disk-1,backup=0,iothread=1,size=1G
scsihw: virtio-scsi-single
smbios1: uuid=b9742f71-7049-494c-9a05-1246536245b7
sockets: 1
usb0: host=0a5c:2101
usb1: host=0bda:8771
usb2: host=0d8c:0014
vmgenid: d4aeaf0b-6c4c-47db-a5e4-7c5443fad2a1
qm config 107
agent: 1
balloon: 1024
boot: order=scsi0;ide2;net0
cores: 2
description: scsi1%3A local-zfs%3Avm-107-disk-1,backup=0,iothread=1,size=1G%0Ascsi1%3A local-zfs%3Avm-107-disk-1,backup=0,iothread=1,size=1G%0Ascsi1%3A local-zfs%3Avm-107-disk-1,backup=0,iothread=1,size=1G
ide2: local:iso/debian-11.6.0-amd64-netinst.iso,media=cdrom,size=388M
memory: 2048
meta: creation-qemu=7.1.0,ctime=1676657118
name: audio-server
net0: virtio=72:C4:1D:3D:E4:16,bridge=vmbr0,firewall=1,tag=50
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-107-disk-0,iothread=1,size=10G
scsi1: local-zfs:vm-107-disk-1,backup=0,iothread=1,size=1G
scsihw: virtio-scsi-single
smbios1: uuid=50a2ccf2-7d3c-44ec-ae3e-0d4339b09423
sockets: 1
tablet: 0
usb0: host=0d8c:0014
usb1: host=0bda:8771
usb2: host=0a5c:2101
vmgenid: f57c0d43-93dc-4ace-8395-2800298e3fe8
qm config 108
agent: 1
balloon: 1024
boot: order=scsi0;ide2;net0
cores: 2
description: Bacha - je tu disk pro swap, kter%C3%BD se nez%C3%A1lohuje... a p%C5%99i restoru je pot%C5%99eba jej p%C5%99ipojit JE%C5%A0T%C4%9A P%C5%98EDT%C3%8DM, ne%C5%BE ma%C5%A1inu spust%C3%AD%C5%A1, jinak bys to musel zase ru%C4%8Dn%C4%9B nastavovat ten swap...
ide2: local:iso/debian-11.6.0-amd64-netinst.iso,media=cdrom,size=388M
memory: 4096
meta: creation-qemu=7.1.0,ctime=1676657118
name: dock-home
net0: virtio=1A:E9:F5:C9:0A:E8,bridge=vmbr0,firewall=1,tag=50
numa: 0
onboot: 1
ostype: l26
parent: po_snapcastu_a_verce
scsi0: local-zfs:vm-108-disk-0,iothread=1,size=16G
scsi1: local-zfs:vm-108-disk-1,backup=0,iothread=1,size=1G
scsihw: virtio-scsi-single
smbios1: uuid=119a3435-c2a6-47be-b0c8-bf8557b2b1f5
sockets: 1
vmgenid: b613dc23-ca3e-4da2-9a7c-39cc41a933a4
Host processor ----------
Intel Celeron Elkhart Lake J6412, 2GHz, max 2.6GHz, quad core, AES-NI
Host log -----------
I forgot to check the host, so here it is:
Some time around the failure (I don't know the exact time :-/ ) I found hundreds of repeated messages per second like:
Apr 11 00:27:57 hyper-home kernel: [ 6212.812606] usb 1-6: usbfs: process 3718 (CPU 1/KVM) did not claim interface 1 before use
Apr 11 00:27:57 hyper-home kernel: [ 6212.981158] usb 1-6: usbfs: process 3717 (CPU 0/KVM) did not claim interface 1 before use
Apr 11 00:27:57 hyper-home QEMU[3615]: kvm: libusb_set_interface_alt_setting: -99 [OTHER]
Don't know if it's the cause or the consequence though. Those two processes (3717 and 3718) are randomly altering in the log. It repeated once after 1 minute and then twice after 5 minutes. Somewhere among those messages I found:
Apr 11 00:34:46 hyper-home kernel: [ 6621.202944] perf: interrupt took too long (3131 > 3130), lowering kernel.perf_event_max_sample_rate to 63750
Rest ----------
I will send the info when it happens again.
I have two usb devices plugged - bluetooth adapter and external soundcard. They are both passed to single VM (107).
About IOMMU - I believe I used wrong term "passthrough", since only I did was "Add HW -> usb device -> use usb vendor ID". So I can "lsusb" see them in both host and the 107 VM and do I understand, that it is not "real passthrough" of device?
But for what it's worth, here is "lspci" for usb - it "shares" functions with RAM, but as I said - I believe I did not make real passthrough?
00:14.0 USB controller: Intel Corporation Device 4b7d (rev 11)
00:14.2 RAM memory: Intel Corporation Device 4b7f (rev 11)