[SOLVED] Windows 10 VM - memory errors when passing through GPU, after upgrading to Proxmox 8.2

marcosscriven

Member
Mar 6, 2021
136
11
23
I'm trying to run a Windows 10 VM clone from a template I've been using absolutely fine. When I passthrough an Nvidia GPU I get these errors in the logs:


Code:
[Fri May 17 08:31:12 2024] x86/PAT: memtype_reserve failed [mem 0xf800000000-0xf801ffffff], track uncached-minus, req uncached-minus
[Fri May 17 08:31:12 2024] ioremap memtype_reserve failed -16
[Fri May 17 08:31:34 2024] x86/PAT: CPU 7/KVM:2231 conflicting memory types f800000000-f802000000 uncached-minus<->write-combining
[Fri May 17 08:31:34 2024] x86/PAT: memtype_reserve failed [mem 0xf800000000-0xf801ffffff], track uncached-minus, req uncached-minus
[Fri May 17 08:31:34 2024] ioremap memtype_reserve failed -16
[Fri May 17 08:31:34 2024] x86/PAT: CPU 7/KVM:2231 conflicting memory types f800000000-f802000000 uncached-minus<->write-combining
[Fri May 17 08:31:34 2024] x86/PAT: memtype_reserve failed [mem 0xf800000000-0xf801ffffff], track uncached-minus, req uncached-minus
[Fri May 17 08:31:34 2024] ioremap memtype_reserve failed -16
[Fri May 17 08:31:34 2024] x86/PAT: CPU 7/KVM:2231 conflicting memory types f800000000-f802000000 uncached-minus<->write-combining
[Fri May 17 08:31:34 2024] x86/PAT: memtype_reserve failed [mem 0xf800000000-0xf801ffffff], track uncached-minus, req uncached-minus
[Fri May 17 08:31:34 2024] ioremap memtype_reserve failed -16
[Fri May 17 08:31:34 2024] x86/PAT: CPU 7/KVM:2231 conflicting memory types f800000000-f802000000 uncached-minus<->write-combining
[Fri May 17 08:31:34 2024] x86/PAT: memtype_reserve failed [mem 0xf800000000-0xf801ffffff], track uncached-minus, req uncached-minus
[Fri May 17 08:31:34 2024] ioremap memtype_reserve failed -16

I see no output, and the whole machine eventually crashes.

I also tried turning "rombar" on in the config, and do get output for a while, but Nvidia GPU driver gives me error 43. As I say, I've successfully passed this GPU through fine, in the same machine, with the same Windows 10 VM clones from a fresh template.

The VM config is:

Code:
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;net0
cores: 16
cpu: host
efidisk0: local-lvm:vm-107-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:15:00.0,pcie=1,rombar=0. <-- The GPU. I tried with rombar on as well
hostpci1: 0000:01:00,pcie=1,x-vga=1,,rombar=0 <--- USB.
machine: pc-q35-8.1
memory: 8192
meta: creation-qemu=8.1.5,ctime=1711703824
name: q10-test-1
net0: virtio=BC:24:11:BD:89:EF,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-lvm:vm-107-disk-1,cache=writeback,discard=on,iothread=1,size=128G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=e388d747-8e02-4b3c-a44d-2057510e27bb
sockets: 1
vmgenid: 8495e022-7cd7-4791-8f3f-517a928ed362

uname:

Code:
uname -a
Linux pve-maxi 6.8.4-3-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) x86_64 GNU/Linux

If I don't passthrough the GPU at all, it works fine.
 
Last edited:
what kernel do you boot currently? (can you post the output of 'dmesg'?)

does it work when booting an older kernel?
 
  • Like
Reactions: marcosscriven
what kernel do you boot currently? (can you post the output of 'dmesg'?)

does it work when booting an older kernel?

Thanks @dcsapak

I did a complete fresh install straight to 8.2, so I don't have the older kernels. Whatever was latest in 8.1 definitely worked.

Right now I have this:

Code:
Linux pve-maxi 6.8.4-3-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) x86_64 GNU/Linux

I also tried the only kernel I have before:
Code:
Linux pve-maxi 6.8.4-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-2 (2024-04-10T17:36Z) x86_64 GNU/Linux

With the same outcome.

I've attached two dmesg files. One from a clean boot without any VMs/LXCs starting. Then just the section after I start the VM in question.
 

Attachments

ok from the dmesg i can see that nouveau is loaded on boot and this comment in an older reddit thread: https://www.reddit.com/r/VFIO/comments/pwpm2h/comment/hym10e7/
had as a workaround to not bind the gpu to a real driver before passing through (it was an older kernel but similar symptoms)

could you try to blacklist the nouveau driver to test if that fixes the problem?
 
  • Like
Reactions: marcosscriven
could you try to blacklist the nouveau driver to test if that fixes the problem?

Thanks @dcsapak! Blacklisting worked. Not sure how I missed that Google result - but I see it was from two years ago.

I wonder what changed that blacklisting is now necessary, when it wasn't (or didn't appear to be) before?
 
not completely sure, could also be a problem of the nouveau driver ?
if i have time i could try to reproduce /bisect it, but that is very time intensive (build kernel + reboot) every time
 
  • Like
Reactions: marcosscriven

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!