[SOLVED] Windows 10 VM - memory errors when passing through GPU, after upgrading to Proxmox 8.2

marcosscriven · May 17, 2024

I'm trying to run a Windows 10 VM clone from a template I've been using absolutely fine. When I passthrough an Nvidia GPU I get these errors in the logs:

Code:

[Fri May 17 08:31:12 2024] x86/PAT: memtype_reserve failed [mem 0xf800000000-0xf801ffffff], track uncached-minus, req uncached-minus
[Fri May 17 08:31:12 2024] ioremap memtype_reserve failed -16
[Fri May 17 08:31:34 2024] x86/PAT: CPU 7/KVM:2231 conflicting memory types f800000000-f802000000 uncached-minus<->write-combining
[Fri May 17 08:31:34 2024] x86/PAT: memtype_reserve failed [mem 0xf800000000-0xf801ffffff], track uncached-minus, req uncached-minus
[Fri May 17 08:31:34 2024] ioremap memtype_reserve failed -16
[Fri May 17 08:31:34 2024] x86/PAT: CPU 7/KVM:2231 conflicting memory types f800000000-f802000000 uncached-minus<->write-combining
[Fri May 17 08:31:34 2024] x86/PAT: memtype_reserve failed [mem 0xf800000000-0xf801ffffff], track uncached-minus, req uncached-minus
[Fri May 17 08:31:34 2024] ioremap memtype_reserve failed -16
[Fri May 17 08:31:34 2024] x86/PAT: CPU 7/KVM:2231 conflicting memory types f800000000-f802000000 uncached-minus<->write-combining
[Fri May 17 08:31:34 2024] x86/PAT: memtype_reserve failed [mem 0xf800000000-0xf801ffffff], track uncached-minus, req uncached-minus
[Fri May 17 08:31:34 2024] ioremap memtype_reserve failed -16
[Fri May 17 08:31:34 2024] x86/PAT: CPU 7/KVM:2231 conflicting memory types f800000000-f802000000 uncached-minus<->write-combining
[Fri May 17 08:31:34 2024] x86/PAT: memtype_reserve failed [mem 0xf800000000-0xf801ffffff], track uncached-minus, req uncached-minus
[Fri May 17 08:31:34 2024] ioremap memtype_reserve failed -16

I see no output, and the whole machine eventually crashes.

I also tried turning "rombar" on in the config, and do get output for a while, but Nvidia GPU driver gives me error 43. As I say, I've successfully passed this GPU through fine, in the same machine, with the same Windows 10 VM clones from a fresh template.

The VM config is:

Code:

agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;net0
cores: 16
cpu: host
efidisk0: local-lvm:vm-107-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:15:00.0,pcie=1,rombar=0. <-- The GPU. I tried with rombar on as well
hostpci1: 0000:01:00,pcie=1,x-vga=1,,rombar=0 <--- USB.
machine: pc-q35-8.1
memory: 8192
meta: creation-qemu=8.1.5,ctime=1711703824
name: q10-test-1
net0: virtio=BC:24:11:BD:89:EF,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-lvm:vm-107-disk-1,cache=writeback,discard=on,iothread=1,size=128G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=e388d747-8e02-4b3c-a44d-2057510e27bb
sockets: 1
vmgenid: 8495e022-7cd7-4791-8f3f-517a928ed362

uname:

Code:

uname -a
Linux pve-maxi 6.8.4-3-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) x86_64 GNU/Linux

If I don't passthrough the GPU at all, it works fine.

dcsapak · May 17, 2024

what kernel do you boot currently? (can you post the output of 'dmesg'?)

does it work when booting an older kernel?

marcosscriven · May 17, 2024

dcsapak said:
what kernel do you boot currently? (can you post the output of 'dmesg'?)

does it work when booting an older kernel?

Thanks @dcsapak

I did a complete fresh install straight to 8.2, so I don't have the older kernels. Whatever was latest in 8.1 definitely worked.

Right now I have this:

Code:

Linux pve-maxi 6.8.4-3-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) x86_64 GNU/Linux

I also tried the only kernel I have before:

Code:

Linux pve-maxi 6.8.4-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-2 (2024-04-10T17:36Z) x86_64 GNU/Linux

With the same outcome.

I've attached two dmesg files. One from a clean boot without any VMs/LXCs starting. Then just the section after I start the VM in question.

dcsapak · May 17, 2024

ok from the dmesg i can see that nouveau is loaded on boot and this comment in an older reddit thread: https://www.reddit.com/r/VFIO/comments/pwpm2h/comment/hym10e7/
had as a workaround to not bind the gpu to a real driver before passing through (it was an older kernel but similar symptoms)

could you try to blacklist the nouveau driver to test if that fixes the problem?

marcosscriven · May 17, 2024

dcsapak said:
could you try to blacklist the nouveau driver to test if that fixes the problem?

Thanks @dcsapak! Blacklisting worked. Not sure how I missed that Google result - but I see it was from two years ago.

I wonder what changed that blacklisting is now necessary, when it wasn't (or didn't appear to be) before?

leesteken · May 17, 2024

marcosscriven said:
I wonder what changed that blacklisting is now necessary, when it wasn't (or didn't appear to be) before?

I needed the same work-around with a RX570 (because amdgpu would crash the GPU). I feel that there is some change/regression with (GPU) drivers (on some GPUs) in Linux kernel 6.8 (compared to 6.5 and earlier).

dcsapak · May 17, 2024

not completely sure, could also be a problem of the nouveau driver ?
if i have time i could try to reproduce /bisect it, but that is very time intensive (build kernel + reboot) every time

Search

Search

[SOLVED] Windows 10 VM - memory errors when passing through GPU, after upgrading to Proxmox 8.2

marcosscriven

Active Member

dcsapak

Proxmox Staff Member

marcosscriven

Active Member

Attachments

dcsapak

Proxmox Staff Member

marcosscriven

Active Member

leesteken

Distinguished Member

dcsapak

Proxmox Staff Member

We value your privacy