iGPU passthrough cause memory failure in Linux VM

zhtengw

New Member
Nov 14, 2022
4
0
1
Hi all, I need somebody's help because I got memory failure in linux VMs when I passthrough iGPU to the VMs. I noticed this issue since I got "internal compiler error" when compiling OpenWRT in a Debian VM with iGPU passthroughed. Then I make a test with memtester. VM with iGPU passthrough shows
Code:
root@debian-server:# /sbin/memtester 1G 1
memtester version 4.5.0 (64-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 1024MB (1073741824 bytes)
got  1024MB (1073741824 bytes), trying mlock ...locked.
Loop 1/1:
  Stuck Address       : testing   2FAILURE: possible bad address line at offset 0x007ac800.
Skipping to next test...
  Random Value        : FAILURE: 0x00000000 != 0x5e7bc1737fea0094 at offset 0x007ac800.
FAILURE: 0x00000000 != 0xff7c654f66f7f7b at offset 0x007ac808.
FAILURE: 0x00000000 != 0xf75d7a7cd5bbc38e at offset 0x007ac810.
FAILURE: 0x00000000 != 0x779a31dd2bd796e2 at offset 0x007ac818.
FAILURE: 0x00000000 != 0x5a7bdceea9ff91ac at offset 0x007ac820.
...

And without iGPU passthrough, memtester passes all the tests.

My setup is as follows:
CPU: Alder lake i3-12100​
GPU: Intel UHD 730​
PVE Version: 7.2-11​
PVE Host Kernel: 5.19.7-2-pve​
Test VM: Debian 11 with 6.0.7-edge kernel​

VM config:
Code:
qm config 104 
agent: 1
bios: ovmf
boot: order=ide2;scsi0;net0
cores: 8
cpu: host
efidisk0: local-lvm:vm-104-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:00:02,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 2048
meta: creation-qemu=7.0.0,ctime=1667973047
name: Debian-Server
net0: virtio=D2:C6:3C:E7:4C:4E,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-104-disk-1,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=4d93c187-e815-49ec-8269-edf019d61e6d
sockets: 1
vga: vmware
vmgenid: b5cb2e0a-9e3a-4455-a26b-6babb88c0bc1

Best regards!
 
This: [1] might also (somewhat) apply here. (Except instead of the SATA-controller(s) the iGPU.)
Anyway, please provide the full output of the command I mentioned there in code-tags here in your thread.

[1] https://forum.proxmox.com/threads/c...n-a-certain-vm-is-running.117947/#post-510580
Hi, Neobin:
Code:
# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s
 ' "$n"; lspci -nns "${d##*/}"; done

IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation Device [8086:4630] (rev 05)
IOMMU group 10 00:1e.0 Communication controller [0780]: Intel Corporation Device [8086:7aa8] (rev 11)
IOMMU group 10 00:1e.3 Serial bus controller [0c80]: Intel Corporation Device [8086:7aab] (rev 11)
IOMMU group 11 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:7a87] (rev 11)
IOMMU group 11 00:1f.3 Audio device [0403]: Intel Corporation Device [8086:7ad0] (rev 11)
IOMMU group 11 00:1f.4 SMBus [0c05]: Intel Corporation Device [8086:7aa3] (rev 11)
IOMMU group 11 00:1f.5 Serial bus controller [0c80]: Intel Corporation Device [8086:7aa4] (rev 11)
IOMMU group 12 01:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 04)
IOMMU group 13 02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
IOMMU group 14 03:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU group 1 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:4692] (rev 0c)
IOMMU group 2 00:08.0 System peripheral [0880]: Intel Corporation Device [8086:464f] (rev 05)
IOMMU group 3 00:0a.0 Signal processing controller [1180]: Intel Corporation Device [8086:467d] (rev 01)
IOMMU group 4 00:14.0 USB controller [0c03]: Intel Corporation Device [8086:7ae0] (rev 11)
IOMMU group 4 00:14.2 RAM memory [0500]: Intel Corporation Device [8086:7aa7] (rev 11)
IOMMU group 5 00:16.0 Communication controller [0780]: Intel Corporation Device [8086:7ae8] (rev 11)
IOMMU group 6 00:17.0 SATA controller [0106]: Intel Corporation Device [8086:7ae2] (rev 11)
IOMMU group 7 00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:7ab8] (rev 11)
IOMMU group 8 00:1c.2 PCI bridge [0604]: Intel Corporation Device [8086:7aba] (rev 11)
IOMMU group 9 00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:7abc] (rev 11)
 
The integrated graphics needs to use normal memory (because it has no graphics memory) and might be the cause of changing parts of the memory all the time. I would hope that the memory used for graphics is inside the VM memory (to not break VM isolation). Maybe memtest does not take that into account (as it does no expect to run inside a VM) and that explains the (false) memory errors inside the VM?
 
Hi, Neobin:
Unfortunately, both answers of your questions are "no". I have not modified the PVE kernel, and only pass intel_iommu=on to the kernel command line.

Code:
root@home:~# cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-5.19.7-2-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on
 
The integrated graphics needs to use normal memory (because it has no graphics memory) and might be the cause of changing parts of the memory all the time. I would hope that the memory used for graphics is inside the VM memory (to not break VM isolation). Maybe memtest does not take that into account (as it does no expect to run inside a VM) and that explains the (false) memory errors inside the VM?
Hi leesteken:
The memtest you mean may be the "memtest86+", a real-mode memory tester. But the "memtester" I mentioned is a system utility for testing the memory subsystem(https://github.com/jnavila/memtester), which using the malloc() function to allocate memory. So it is truely can run inside a VM.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!