PROXMOX 8.0.4 Host-side problems with Win10 Pro guest stalls, also mlx4 VF compat


PVE 8.0.4 with 2x E5-2697v4.

I have 2 (or more?) issues to fix, the biggest problem is #2:

1) Is there an updated mlx4_core/en/ib for PVE 8.0.4/kernel 6.2.16-6-pve available somewhere?
- Current modinfo mlx4_core => 4.0-0
- Windows Error 43 on "VPI" adapter (ConnectX3 ethernet mode VF pcie-passthrough)
- Not sure how to apply patch to fix problems with windows guests
- I need to get finished transitioning all VMs to ConnectX3 SRIOV VFs, and there is a known issue with Windows guests due to host-side ConnectX driver implementation.
- AFAIK something like that. If I'm headed in the wrong direction for fixing, please correct me.

Referenced threads:

2) Windows 10 Pro "whole" guest stalls/momentary freeze
- The whole guest system stalls for ~200-500ms regularly but with non-periodic pattern.
- Timing of stall is anywhere from every few seconds to every few minutes.
- Issue is most evident when playing audio using pcie-passed-through hardware (audio buzzes for the duration of the guest system stall AKA hardware repeatedly replays ~50ms buffer due to lack of new data from OS), but even when nothing other than guest OS is running, everything freezes including mouse pointer.
- Stall does not appear to be directly CPU load dependent since stall occurs even when system is idle, but does appear to occur more frequently when under load.
- Stall is not obviously correlated to any virtual or physical IO, occurs with or without any signficant IO load, and significant IO load does not automatically trigger a stall.
- Watching resource monitor shows a SIMULTANEOUS spike in CPU across ALL cores AFTER the stall ends and monitor telemetry continues, as is there is no significant CPU usage prior to the moment of stall, but the first data point obtained during/immediately after stall shows a single momentary spike in CPU on every core no matter the previous or subsequent load. EVERY stall has some amount of simultaneous CPU spike across all cores.


Any ideas on this one?


agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0;ide0
cores: 18
cpu: host
efidisk0: local-dir:666/vm-666-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:81:00,pcie=1,x-vga=1
hostpci2: 0000:03:00,pcie=1
ide0: local-dir:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: q35
memory: 16000
meta: creation-qemu=8.0.2,ctime=1692051742
name: Win10
net0: virtio=42:AF:31:71:92:B8,bridge=vmbr0040
numa: 0
ostype: l26
scsi0: local-zfs:vm-666-disk-0,iothread=1,size=256G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=5d39371a-8298-42f4-9d79-e16ab78e411f
sockets: 1
tpmstate0: local-dir:666/vm-666-disk-2.raw,size=4M,version=v2.0

NOTE: "hostpci1" is usually the ConnectX3 SRIOV VF, but I removed it for testing the Guest Stall problem (makes no difference to stall).
"hostpci0:" = RTX4090
"hostpci2:" = PCIe USB controller card
"CPUs:" = Intel Xeon 2x E5-2697v4

Monitor and usb keyboard/mouse are connected to pcie-passthrough'ed guest hardware.
