crashing VM - Error XT4-FS errors - Think it is SSD related

chrischambers

Member
Nov 1, 2022
33
2
13
I have been having this issue for some time, and if you look at my profile you will see that I have a few ports running,

after some more testing, I have replace my motherboard with a different and I still got the same issue that my system would crash after a few hours using the old SSD, LSI Controller etc.

so I give back the motherboard, reinstall my old one, and purchase a new SSD, but was I was waiitng for the SSD to my deliveryed " you have to love amazon 3 weeks " I found a old Spinning HD and install proxmox onto this, and all the other VMs, and it ran stable for 5 weeks with no issues.

but today when I choose to replace the HD with SSD the error return, which now make me think it is a SSD related issue.
my Proxmox version 8.4.11

my new SSD is a Kingston SSD 960GB Enterprise.

CPU 16 x AMD Ryzen 7 2700 Eight-Core Processor (1 Socket
RAM: 32GB
MB: B450 TOMAHAWK
Bios: Lastest Stable version 7C02v1I


my environment is
VM Unraid with a Broadcom / LSI SAS 3006 PCI-Express Fusion-MPT SAS-3 - with approx. 6 spinning hard drives
VM Plex
VM Home Assistant
VM HandBrack with a SCSI connected Hard Drive SCSI9 /dev/disk/by-id/ATA-WD10**************************,backup=0,size=976762584K

I have attached a copy of my log files in the hope that someone can see why I am getting this issue, it was running fine until 13:17:01

I have also included the results of lspci -nnk

Code:
root@pve:~#  lspci -nnk
root@pve:~#  lspci -nnk
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
        Subsystem: Micro-Star International Co., Ltd. [MSI] Family 17h (Models 00h-0fh) Root Complex [1462:7c02]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
        Subsystem: Micro-Star International Co., Ltd. [MSI] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1462:7c02]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
        Subsystem: Micro-Star International Co., Ltd. [MSI] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1462:7c02]
        Kernel driver in use: pcieport
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
        Subsystem: Micro-Star International Co., Ltd. [MSI] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1462:7c02]
        Kernel driver in use: pcieport
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
        Subsystem: Device [7c02:1462]
        Kernel driver in use: pcieport
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
        Subsystem: Device [7c02:1462]
        Kernel driver in use: pcieport
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
        Subsystem: Micro-Star International Co., Ltd. [MSI] FCH SMBus Controller [1462:7c02]
        Kernel driver in use: piix4_smbus
        Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
        Subsystem: Micro-Star International Co., Ltd. [MSI] FCH LPC Bridge [1462:7c02]
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
        Kernel driver in use: k10temp
        Kernel modules: k10temp
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
03:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 xHCI Compliant Host Controller [1022:43d5] (rev 01)
        Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 xHCI Compliant Host Controller [1b21:1142]
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci
03:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
        Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller [1b21:1062]
        Kernel driver in use: ahci
        Kernel modules: ahci
03:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
        Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Bridge [1b21:0201]
        Kernel driver in use: pcieport
20:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
        Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Port [1b21:3306]
        Kernel driver in use: pcieport
20:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
        Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Port [1b21:3306]
        Kernel driver in use: pcieport
20:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
        Subsystem: ASMedia Technology Inc. 400 Series Chipset PCIe Port [1b21:3306]
        Kernel driver in use: pcieport
22:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
        Subsystem: Micro-Star International Co., Ltd. [MSI] RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller [1462:7c02]
        Kernel driver in use: r8169
        Kernel modules: r8169
25:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
        Subsystem: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097]
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas
26:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT218 [GeForce 210] [10de:0a65] (rev a2)
        Subsystem: ASUSTeK Computer Inc. EN210 SILENT [1043:8334]
        Kernel driver in use: nouveau
        Kernel modules: nvidiafb, nouveau
26:00.1 Audio device [0403]: NVIDIA Corporation High Definition Audio Controller [10de:0be3] (rev a1)
        Subsystem: ASUSTeK Computer Inc. High Definition Audio Controller [1043:8334]
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel
27:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
        Subsystem: Micro-Star International Co., Ltd. [MSI] Zeppelin/Raven/Raven2 PCIe Dummy Function [1462:7c02]
27:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor (PSP) 3.0 Device [1022:1456]
        Subsystem: Micro-Star International Co., Ltd. [MSI] Family 17h (Models 00h-0fh) Platform Security Processor (PSP) 3.0 Device [1462:7c02]
        Kernel driver in use: ccp
        Kernel modules: ccp
27:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 xHCI Compliant Host Controller [1022:145f]
        Subsystem: Micro-Star International Co., Ltd. [MSI] Zeppelin USB 3.0 xHCI Compliant Host Controller [1462:7c02]
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci
28:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
        Subsystem: Micro-Star International Co., Ltd. [MSI] Zeppelin/Renoir PCIe Dummy Function [1462:7c02]
28:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
        Subsystem: Micro-Star International Co., Ltd. [MSI] FCH SATA Controller [AHCI mode] [1462:7c02]
        Kernel driver in use: ahci
        Kernel modules: ahci
28:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]
        Subsystem: Micro-Star International Co., Ltd. [MSI] Family 17h (Models 00h-0fh) HD Audio Controller [1462:ec02]
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel
 

Attachments

From your description the instability is strongly tied to SSD usage, since the system runs fine with spinning disks but consistently crashes within hours once the SSD is involved. The fact that it happens with both the old and the new SSD suggests it is not a single faulty drive but more likely a controller, driver, firmware or compatibility problem. Your B450 board and Ryzen 2700 are stable with HDDs, which rules out general CPU/RAM/board instability. Looking at the lspci output I see the Kingston SSD is most likely attached to the AMD 400 Series SATA controller (03:00.1 or 28:00.2) using the ahci driver. AMD’s SATA controllers on B450/X470 have a long history of instability with some SSDs, particularly under Linux kernels like the one shipped in Proxmox 8.4 (Debian 12 base). Symptoms are exactly what you describe: random lockups or crashes when the SSD is stressed.
First step: check dmesg and syslog around the crash time for ahci or I/O errors, you will often see resets, timeouts, or “frozen” messages. Second: update the SSD firmware if Kingston provides a newer release, many enterprise models had fixes for Linux compatibility. Third: try connecting the SSD to a different port/controller if your board has both chipset SATA and ASMedia SATA (your lspci shows both), sometimes moving it from the AMD FCH ports to the ASMedia ports stabilizes it. Fourth: if the drive is NVMe, check whether it is running on PCIe 2.0 instead of 3.0, and force gen3 in BIOS.
If none of that helps, the most reliable workaround is to add a cheap dedicated HBA or SATA controller (e.g. an LSI 9211-8i flashed to IT mode) and attach the SSD there, bypassing the buggy onboard SATA. That is why your LSI SAS3008 card works fine with spinning drives but the onboard chipset crashes with SSDs.
 
thanks for the reply, I will start going through the steps, and I will post my results in a few weeks, but it is interesting to do that this is a know issue with the B450 and the Linux kernels.