PVE 8.0.4 with 2x E5-2697v4.
I have 2 (or more?) issues to fix, the biggest problem is #2:
1) Is there an updated mlx4_core/en/ib for PVE 8.0.4/kernel 6.2.16-6-pve available somewhere?
- Current modinfo mlx4_core => 4.0-0
- Windows Error 43 on "VPI" adapter (ConnectX3 ethernet mode VF pcie-passthrough)
- Not sure how to apply patch to fix problems with windows guests
- I need to get finished transitioning all VMs to ConnectX3 SRIOV VFs, and there is a known issue with Windows guests due to host-side ConnectX driver implementation.
- AFAIK something like that. If I'm headed in the wrong direction for fixing, please correct me.
Referenced threads:
https://forums.servethehome.com/ind...x-host-and-windows-guest-via-kvm.28956/page-3
https://forum.proxmox.com/threads/h...ox-connectx-3-cards-for-sriov-and-vfs.121927/
https://forum.proxmox.com/threads/pve-kernel-6-2-16-6-pve-build-issue.131779/
2) Windows 10 Pro "whole" guest stalls/momentary freeze
- The whole guest system stalls for ~200-500ms regularly but with non-periodic pattern.
- Timing of stall is anywhere from every few seconds to every few minutes.
- Issue is most evident when playing audio using pcie-passed-through hardware (audio buzzes for the duration of the guest system stall AKA hardware repeatedly replays ~50ms buffer due to lack of new data from OS), but even when nothing other than guest OS is running, everything freezes including mouse pointer.
- Stall does not appear to be directly CPU load dependent since stall occurs even when system is idle, but does appear to occur more frequently when under load.
- Stall is not obviously correlated to any virtual or physical IO, occurs with or without any signficant IO load, and significant IO load does not automatically trigger a stall.
- Watching resource monitor shows a SIMULTANEOUS spike in CPU across ALL cores AFTER the stall ends and monitor telemetry continues, as is there is no significant CPU usage prior to the moment of stall, but the first data point obtained during/immediately after stall shows a single momentary spike in CPU on every core no matter the previous or subsequent load. EVERY stall has some amount of simultaneous CPU spike across all cores.
Any ideas on this one?
Thanks!
I have 2 (or more?) issues to fix, the biggest problem is #2:
1) Is there an updated mlx4_core/en/ib for PVE 8.0.4/kernel 6.2.16-6-pve available somewhere?
- Current modinfo mlx4_core => 4.0-0
- Windows Error 43 on "VPI" adapter (ConnectX3 ethernet mode VF pcie-passthrough)
- Not sure how to apply patch to fix problems with windows guests
- I need to get finished transitioning all VMs to ConnectX3 SRIOV VFs, and there is a known issue with Windows guests due to host-side ConnectX driver implementation.
- AFAIK something like that. If I'm headed in the wrong direction for fixing, please correct me.
Referenced threads:
https://forums.servethehome.com/ind...x-host-and-windows-guest-via-kvm.28956/page-3
https://forum.proxmox.com/threads/h...ox-connectx-3-cards-for-sriov-and-vfs.121927/
https://forum.proxmox.com/threads/pve-kernel-6-2-16-6-pve-build-issue.131779/
2) Windows 10 Pro "whole" guest stalls/momentary freeze
- The whole guest system stalls for ~200-500ms regularly but with non-periodic pattern.
- Timing of stall is anywhere from every few seconds to every few minutes.
- Issue is most evident when playing audio using pcie-passed-through hardware (audio buzzes for the duration of the guest system stall AKA hardware repeatedly replays ~50ms buffer due to lack of new data from OS), but even when nothing other than guest OS is running, everything freezes including mouse pointer.
- Stall does not appear to be directly CPU load dependent since stall occurs even when system is idle, but does appear to occur more frequently when under load.
- Stall is not obviously correlated to any virtual or physical IO, occurs with or without any signficant IO load, and significant IO load does not automatically trigger a stall.
- Watching resource monitor shows a SIMULTANEOUS spike in CPU across ALL cores AFTER the stall ends and monitor telemetry continues, as is there is no significant CPU usage prior to the moment of stall, but the first data point obtained during/immediately after stall shows a single momentary spike in CPU on every core no matter the previous or subsequent load. EVERY stall has some amount of simultaneous CPU spike across all cores.
Any ideas on this one?
Thanks!