Hello.
I'm using GPU passthrough for a while, but today i has my host crashed.
There was just two lines on the host before a crash.
I'm using R7 240 with Ryzen 3600 with ECC memory.
VM config:
VM syslog: https://pastebin.com/Dtumhyst
What's wrong? It is a production environment so what can i do to isolate such a crash only to VM, not a host ?
I'm using GPU passthrough for a while, but today i has my host crashed.
There was just two lines on the host before a crash.
Oct 23 11:52:02 pve kernel: [1034431.681561] vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xebf0c35000 flags=0x0030]
Oct 23 11:52:02 pve kernel: [1034431.681573] vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xebf0c34800 flags=0x0030]
I'm using R7 240 with Ryzen 3600 with ECC memory.
VM config:
bios: ovmf
hostpci0: 0000:07:00,pcie=1,x-vga=1
machine: q35
vga: none
# cat /proc/cmdline
initrd=\EFI\proxmox\5.11.22-5-pve\initrd.img-5.11.22-5-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs iommu=pt video=efifbff mitigations=off
# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/7/devices/0000:00:08.0
/sys/kernel/iommu_groups/5/devices/0000:00:07.0
/sys/kernel/iommu_groups/13/devices/0000:09:00.1
/sys/kernel/iommu_groups/3/devices/0000:00:04.0
/sys/kernel/iommu_groups/11/devices/0000:08:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:02.0
/sys/kernel/iommu_groups/8/devices/0000:00:08.1
/sys/kernel/iommu_groups/6/devices/0000:00:07.1
/sys/kernel/iommu_groups/14/devices/0000:09:00.3
/sys/kernel/iommu_groups/4/devices/0000:00:05.0
/sys/kernel/iommu_groups/12/devices/0000:09:00.0
/sys/kernel/iommu_groups/2/devices/0000:00:03.1
/sys/kernel/iommu_groups/2/devices/0000:07:00.0
/sys/kernel/iommu_groups/2/devices/0000:00:03.0
/sys/kernel/iommu_groups/2/devices/0000:07:00.1
/sys/kernel/iommu_groups/10/devices/0000:00:18.3
/sys/kernel/iommu_groups/10/devices/0000:00:18.1
/sys/kernel/iommu_groups/10/devices/0000:00:18.6
/sys/kernel/iommu_groups/10/devices/0000:00:18.4
/sys/kernel/iommu_groups/10/devices/0000:00:18.2
/sys/kernel/iommu_groups/10/devices/0000:00:18.0
/sys/kernel/iommu_groups/10/devices/0000:00:18.7
/sys/kernel/iommu_groups/10/devices/0000:00:18.5
/sys/kernel/iommu_groups/0/devices/0000:03:00.0
/sys/kernel/iommu_groups/0/devices/0000:02:00.2
/sys/kernel/iommu_groups/0/devices/0000:02:00.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.0
/sys/kernel/iommu_groups/0/devices/0000:01:00.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.3
/sys/kernel/iommu_groups/0/devices/0000:02:00.1
/sys/kernel/iommu_groups/0/devices/0000:00:01.1
/sys/kernel/iommu_groups/0/devices/0000:05:00.0
/sys/kernel/iommu_groups/0/devices/0000:03:01.0
/sys/kernel/iommu_groups/0/devices/0000:03:04.0
/sys/kernel/iommu_groups/9/devices/0000:00:14.3
/sys/kernel/iommu_groups/9/devices/0000:00:14.0
07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland PRO [Radeon R7 240/340] (rev 87) (prog-if 00 [VGA controller])
Subsystem: Micro-Star International Co., Ltd. [MSI] Oland PRO [Radeon R7 240/340]
Flags: bus master, fast devsel, latency 0, IRQ 72, IOMMU group 2
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at fce00000 (64-bit, non-prefetchable) [size=256K]
I/O ports at e000
Expansion ROM at fce40000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] Physical Resizable BAR
Capabilities: [270] Secondary PCI Express
Kernel driver in use: vfio-pci
Kernel modules: radeon, amdgpu
VM syslog: https://pastebin.com/Dtumhyst
What's wrong? It is a production environment so what can i do to isolate such a crash only to VM, not a host ?
Last edited: