[SOLVED] Issues with AMD on 9.0.6

tkffaul

Member
Apr 8, 2021
27
10
23
46
I have upgraded several of our intel servers with ZERO issues... but i have a small backup server that it using a Ryzen 5 5600g and it ran like a top prior to upgrade, after the upgrade it wont start anything. No VM's not LXC's. It boots fine, seems likes its all there but as soon as you start a vm it just hangs up and act wonky. I try very hard to not waste everyone's time... so I've tried changing the cpu type (multiple types including host), I've pulled everything custom out, I tried a new install. Nothing seems to change the outcome of what i have going on. Worst part is there are basically no logs... DMESG looks great. Syslog shows:

Code:
Aug 26 09:06:11 pve-backup pvedaemon[2194]: start VM 700: UPIDve-backup:00000892:00002230:68ADB143:qmstart:700:root@pam:
Aug 26 09:06:11 pve-backup kernel: sd 6:0:0:0: [sdb] Synchronizing SCSI cache
Aug 26 09:06:11 pve-backup kernel: ata7.00: Entering standby power mode
Aug 26 09:06:12 pve-backup kernel: sd 7:0:0:0: [sdc] Synchronizing SCSI cache
Aug 26 09:06:12 pve-backup kernel: ata8.00: Entering standby power mode
Aug 26 09:06:12 pve-backup kernel: sd 8:0:0:0: [sdd] Synchronizing SCSI cache
Aug 26 09:06:12 pve-backup kernel: ata9.00: Entering standby power mode
Aug 26 09:06:13 pve-backup kernel: sd 9:0:0:0: [sde] Synchronizing SCSI cache
Aug 26 09:06:13 pve-backup kernel: ata10.00: Entering standby power mode
Aug 26 09:06:13 pve-backup kernel: sd 10:0:0:0: [sdf] Synchronizing SCSI cache
Aug 26 09:06:13 pve-backup kernel: ata11.00: Entering standby power mode
Aug 26 09:06:15 pve-backup kernel: vfio-pci 0000:05:00.0: resetting
Aug 26 09:06:15 pve-backup kernel: vfio-pci 0000:05:00.0: reset done
Aug 26 09:06:15 pve-backup systemd[1]: Created slice qemu.slice - Slice /qemu.
Aug 26 09:06:15 pve-backup systemd[1]: Started 700.scope.
Aug 26 09:06:15 pve-backup kernel: kauditd_printk_skb: 115 callbacks suppressed
Aug 26 09:06:15 pve-backup kernel: audit: type=1400 audit(1756213575.585:127): apparmor="DENIED" operation="capable" class="cap" profile="swtpm" pid=2311 comm="swtpm" capability=21  capname="sys_admin"
Aug 26 09:06:16 pve-backup kernel: tap700i0: entered promiscuous mode
Aug 26 09:06:16 pve-backup kernel: vmbr1: port 2(tap700i0) entered blocking state
Aug 26 09:06:16 pve-backup kernel: vmbr1: port 2(tap700i0) entered disabled state
Aug 26 09:06:16 pve-backup kernel: tap700i0: entered allmulticast mode
Aug 26 09:06:16 pve-backup kernel: vmbr1: port 2(tap700i0) entered blocking state
Aug 26 09:06:16 pve-backup kernel: vmbr1: port 2(tap700i0) entered forwarding state
Aug 26 09:06:16 pve-backup kernel: tap700i1: entered promiscuous mode
Aug 26 09:06:16 pve-backup kernel: vmbr0: port 1(tap700i1) entered blocking state
Aug 26 09:06:16 pve-backup kernel: vmbr0: port 1(tap700i1) entered disabled state
Aug 26 09:06:16 pve-backup kernel: tap700i1: entered allmulticast mode
Aug 26 09:06:16 pve-backup kernel: vmbr0: port 1(tap700i1) entered blocking state
Aug 26 09:06:16 pve-backup kernel: vmbr0: port 1(tap700i1) entered forwarding state
Aug 26 09:06:18 pve-backup kernel: vfio-pci 0000:05:00.0: resetting
Aug 26 09:06:18 pve-backup kernel: vfio-pci 0000:05:00.0: reset done
Aug 26 09:06:18 pve-backup kernel: vfio-pci 0000:05:00.0: resetting
Aug 26 09:06:19 pve-backup kernel: vfio-pci 0000:05:00.0: reset done
Aug 26 09:06:19 pve-backup pvedaemon[2194]: VM 700 started with PID 2316.
Aug 26 09:06:19 pve-backup pvedaemon[1805]: <root@pam> end task UPIDve-backup:00000892:00002230:68ADB143:qmstart:700:root@pam: OK
Aug 26 09:06:32 pve-backup pvestatd[1779]: VM 700 qmp command failed - VM 700 qmp command 'query-proxmox-support' failed - got timeout
Aug 26 09:06:37 pve-backup pvestatd[1779]: status update time (13.119 seconds)
Aug 26 09:06:45 pve-backup pvestatd[1779]: VM 700 qmp command failed - VM 700 qmp command 'query-proxmox-support' failed - unable to connect to VM 700 qmp socket - timeout after 48 retries
Aug 26 09:06:49 pve-backup pvestatd[1779]: status update time (12.710 seconds)
Aug 26 09:06:57 pve-backup pvestatd[1779]: VM 700 qmp command failed - VM 700 qmp command 'query-proxmox-support' failed - unable to connect to VM 700 qmp socket - timeout after 49 retries
Aug 26 09:07:00 pve-backup pvestatd[1779]: status update time (11.093 seconds)

I cant find anything on whats going on here. If i shut the vm down it goes down but it wont restart saying that its waiting on systemd. Any help would be appreciated, as i'm tired of banging my head on the wall here.
 
Last edited:
Please edit your post to use code blocks like below and share
Bash:
grep -sR "hostpci" /etc
lspci
 
Please edit your post to use code blocks like below and share
Bash:
grep -sR "hostpci" /etc
lspci
Here is output:
Code:
root@pve-backup:~# grep -sR "hostpci" /etc
/etc/pve/local/qemu-server/700.conf:hostpci0: 0000:05:00,pcie=1
/etc/pve/qemu-server/700.conf:hostpci0: 0000:05:00,pcie=1
/etc/pve/nodes/pve-backup/qemu-server/700.conf:hostpci0: 0000:05:00,pcie=1

Code:
root@pve-backup:~# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7
01:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
02:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset USB 3.1 XHCI Controller
02:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset SATA Controller
02:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset Switch Upstream Port
03:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset Switch Downstream Port
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15)
05:00.0 SATA controller: ASMedia Technology Inc. ASM1166 Serial ATA Controller (rev 02)
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c9)
06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
06:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
06:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
06:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
06:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller
 
So i didnt change my PCI passthru of a sata controller. I had this same setup on another intel version of the same type system. I did notice that PVE is loading the controller. I have it blocked in both grub and modprobe.

lspci -n:
Code:
05:00.0 0106: 1b21:1166 (rev 02)

Grub:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on amd_pstate=active pcie_acs_override=downstream,multifunction vfio-pci.ids=1b21:1166"

/etc/modprobe.d/vfio.conf
Code:
options vfio-pci ids=1b21:1166

but if i lspci -v:
Code:
05:00.0 SATA controller: ASMedia Technology Inc. ASM1166 Serial ATA Controller (rev 02) (prog-if 01 [AHCI 1.0])
        Subsystem: ZyDAS Technology Corp. Device 2116
        Flags: bus master, fast devsel, latency 0, IRQ 38, IOMMU group 16
        Memory at fcf82000 (32-bit, non-prefetchable) [size=8K]
        Memory at fcf80000 (32-bit, non-prefetchable) [size=8K]
        Expansion ROM at fcf00000 [disabled] [size=512K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [80] Express Endpoint, IntMsgNum 0
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [130] Secondary PCI Express
        Kernel driver in use: ahci
        Kernel modules: ahci

So its loading ahci... but to put this to bed. If i start the VM without the pass thru... same issue. It doesn't seem to matter if i pass the controller or not the VM does the same thing with the same logs. <-- sorry this is wrong after a reboot without the pass thru the vm will start! So the issue must be in the passthru.
 
Last edited:
So i didnt change my PCI passthru of a sata controller.
The Proxmox kernel version 6.14 (that comes with PVE 9.0) appears to have some issues with passthrough of ASMedia SATA controllers (on AMD motherboards). There are several threads about this but I can't find them at the moment. Maybe try (installing and) booting proxmox-kernel-6.8 (temporarily) to see if the problem goes away.
 
The Proxmox kernel version 6.14 (that comes with PVE 9.0) appears to have some issues with passthrough of ASMedia SATA controllers (on AMD motherboards). There are several threads about this but I can't find them at the moment. Maybe try (installing and) booting proxmox-kernel-6.8 (temporarily) to see if the problem goes away.
This does work, but was hoping to get 6.14 fixed up. I found one of the threads you were talking about but there is no fix at the moment from that thread. My only recourse appears to revert back to 6.8
 
Last edited:
I believe you meant this post:
 
  • Like
Reactions: leesteken