["SOLVED"] VM crash 5-10 minuttes after starting a game.

AngryAdm

Member
Sep 5, 2020
145
30
18
93
-The host has run gpu passthrough with proxmox for almost a year.
-This issue started monday.
-GPU has been replaced with exact same model. No change, still crashing
-No hardware changes happened before this issue started appearing. I suspect it came with an update.
-No configuration changes have been made prior to this error apearing
-Guest is WIndows 10
-prox. is fully patched with non-subs.
-host becomes unresponsive with 2500 ms pings, then timeouts. eventually it responds to ping. Since its a single GPU system, I cannot do anything

Today i was lucky enough to get a syslog snippet:

May 08 14:29:24 pve02 QEMU[1797]: kvm: vfio_err_notifier_handler(0000:11:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
May 08 14:29:24 pve02 kernel: pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:03.1
May 08 14:29:24 pve02 kernel: pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
May 08 14:29:24 pve02 kernel: pcieport 0000:00:03.1: AER: device [1022:1483] error status/mask=00100000/04400000
May 08 14:29:24 pve02 kernel: pcieport 0000:00:03.1: AER: [20] UnsupReq (First)
May 08 14:29:24 pve02 kernel: pcieport 0000:00:03.1: AER: TLP Header: 60000802 11001884 00004323 4a95f820
May 08 14:29:24 pve02 kernel: pcieport 0000:00:03.1: AER: Device recovery successful
May 08 14:29:24 pve02 QEMU[1797]: kvm: vfio_err_notifier_handler(0000:11:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest




lspci output:
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex [1022:1480]
Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex [1022:1480]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]
Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
Kernel driver in use: pcieport
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
Kernel driver in use: pcieport
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
Kernel driver in use: pcieport
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
Kernel driver in use: pcieport
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
Kernel driver in use: pcieport
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
Subsystem: ASRock Incorporation FCH SMBus Controller [1849:ffff]
Kernel driver in use: piix4_smbus
Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
Subsystem: ASRock Incorporation FCH LPC Bridge [1849:ffff]
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0 [1022:1440]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1 [1022:1441]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2 [1022:1442]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3 [1022:1443]
Kernel driver in use: k10temp
Kernel modules: k10temp
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4 [1022:1444]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5 [1022:1445]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6 [1022:1446]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7 [1022:1447]
01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:57ad]
Kernel driver in use: pcieport
02:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:57a3]
Kernel driver in use: pcieport
02:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:57a3]
Kernel driver in use: pcieport
02:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:57a3]
Kernel driver in use: pcieport
02:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:57a3]
Kernel driver in use: pcieport
02:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:57a4]
Kernel driver in use: pcieport
02:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:57a4]
Kernel driver in use: pcieport
02:0a.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:57a4]
Kernel driver in use: pcieport
03:00.0 Non-Volatile memory controller [0108]: Silicon Motion, Inc. Device [126f:2262] (rev 03)
Subsystem: Silicon Motion, Inc. Device [126f:2262]
Kernel driver in use: nvme
04:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1184e PCIe Switch Port [1b21:1184]
Kernel driver in use: pcieport
05:01.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1184e PCIe Switch Port [1b21:1184]
Kernel driver in use: pcieport
05:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1184e PCIe Switch Port [1b21:1184]
Kernel driver in use: pcieport
05:05.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1184e PCIe Switch Port [1b21:1184]
Kernel driver in use: pcieport
05:07.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1184e PCIe Switch Port [1b21:1184]
Kernel driver in use: pcieport
06:00.0 Network controller [0280]: Intel Corporation Device [8086:2723] (rev 1a)
Subsystem: Intel Corporation Device [8086:0084]
Kernel driver in use: iwlwifi
Kernel modules: iwlwifi
08:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
Subsystem: ASRock Incorporation I211 Gigabit Network Connection [1849:1539]
Kernel driver in use: igb
Kernel modules: igb
0a:00.0 PCI bridge [0604]: Tundra Semiconductor Corp. Tsi381 PCIe to PCI Bridge [10e3:8111] (rev 02)
0b:00.0 Multimedia audio controller [0401]: Creative Labs CA0108/CA10300 [Sound Blaster Audigy Series] [1102:0008]
Subsystem: Creative Labs SB1550 Audigy 5/Rx [1102:1024]
Kernel driver in use: vfio-pci
Kernel modules: snd_emu10k1
0c:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. Device [10ec:8125] (rev 01)
Subsystem: ASRock Incorporation Device [1849:8125]
Kernel driver in use: r8169
Kernel modules: r8169
0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
Kernel driver in use: vfio-pci
0d:00.1 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
Subsystem: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:1486]
Kernel driver in use: vfio-pci
Kernel modules: xhci_pci
0d:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
Subsystem: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:148c]
Kernel driver in use: vfio-pci
Kernel modules: xhci_pci
0e:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
Subsystem: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901]
Kernel driver in use: ahci
Kernel modules: ahci
0f:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
Subsystem: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901]
Kernel driver in use: ahci
Kernel modules: ahci
10:00.0 Non-Volatile memory controller [0108]: Silicon Motion, Inc. Device [126f:2262] (rev 03)
Subsystem: Silicon Motion, Inc. Device [126f:2262]
Kernel driver in use: nvme
11:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290/390] [1002:67b1]
Subsystem: Micro-Star International Co., Ltd. [MSI] Hawaii PRO [Radeon R9 290/390] [1462:3081]
Kernel driver in use: vfio-pci
Kernel modules: radeon, amdgpu
11:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X] [1002:aac8]
Subsystem: Micro-Star International Co., Ltd. [MSI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X] [1462:aac8]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
12:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
13:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
Kernel driver in use: vfio-pci
13:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
Kernel driver in use: ccp
Kernel modules: ccp
13:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
Subsystem: ASRock Incorporation Matisse USB 3.0 Host Controller [1849:ffff]
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!