GPU PCIe Passthrough : Lag spike with Kernel 6.2

adofou

Member
Mar 14, 2020
7
1
23
33
Hello,

I've been looking into a problem for several days since I updated to Proxmox 8.
On each machine, I have nvidia P2200 video cards that are used for video encoding in a Windows VM.
I also use OBS to manage some live events.

Since I've upgraded to Proxmox 8, I've had numerous image freezes during encoding.
I can see it in the OBS stats, where the counter of missed/ignored frames, whether due to rendering delay or encoding delay, is rising.
After much testing, this seems to be most prevalent when the graphics card is solicited with a remote desktop or directly in the VM console. Even if it's not exclusive to this use. Moreover, these lags/freezes are widespread, whether on the remote desktop or on encoding.

The windows task manager doesn't show any CPU or graphics card saturation, but you can clearly see the freezes during simple actions, such as opening a folder, or just connection on remote desktop on the VM.
The average rendering time per frame explodes for a moment (from 2.5ms to sometimes 50 or even 100ms) and drop frames, before dropping back to 2ms.

I've tested a whole bunch of encoder configuration combinations, updated windows/software, updated the driver, with no improvement.
After several nights, I finally took the decision to reboot the machine with an old Kernel 5.15.108, and as if by magic, all the problems of lag on the remote desktop, image download lag or encoding drop disappeared.

I did the test on two machines with a same hardware and software configuration, each with the same model of graphics card and identical VM.
I have the problem on both machines. Both work perfectly when rebooted to Kernel 5.15.108.

My conclusion is that the problem seems to be linked to Kernel version 6.2. I tested 6.2.16-3-pve and 6.2.16-15-pve. Both have issue.
I've searched the internet and the kernel.org bug tracker without really finding anything.
It sounds like trouble with the PCIe passthrough, but I can't seem to find any information to help me go any further.

Any help was welcome!

proxmox-ve: 8.0.2 (running kernel: 5.15.108-1-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
proxmox-kernel-6.2: 6.2.16-15
pve-kernel-6.2.16-3-pve: 6.2.16-3
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.39-4-pve: 5.15.39-4
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-4-pve: 5.13.19-9
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.26-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.9
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.3-1
proxmox-backup-file-restore: 3.0.3-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.9
pve-cluster: 8.0.4
pve-container: 5.0.4
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-6
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.13-pve1

agent: 1
audio0: device=ich9-intel-hda,driver=none
bios: ovmf
boot: order=virtio0;net0;sata2
cores: 16
efidisk0: local-zfs:vm-201-disk-0,size=1M
hostpci0: 0000:84:00,pcie=1
machine: pc-q35-6.0
memory: 8192
name: VM-XXX-XXXX
net0: virtio=EA:7F:C0:7F:A6:C3,bridge=vmbr0
numa: 0
onboot: 1
ostype: win10
sata2: none,media=cdrom
scsihw: virtio-scsi-pci
smbios1: uuid=9ffbfb6e-4faa-44d7-9938-702c24e01360
sockets: 1
tags: replicate;th3
virtio0: local-zfs:vm-201-disk-1,discard=on,size=50G
virtio1: local-zfs:vm-201-disk-2,discard=on,size=500G
vmgenid: b2d333a1-6cba-4ecd-8683-96d8fbb4edbc

84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106GL [Quadro P2200] [10de:1c31] (rev a1)
Subsystem: NVIDIA Corporation GP106GL [Quadro P2200] [10de:131b]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau

Many thanks!
 

Attachments

  • dmesg.txt
    152.8 KB · Views: 1
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!