proxmox 8.4.1 crashed:EXT4-fs error (device dm-1):ext4_journal_check_start xxx error

vitaminsss

New Member
Jul 8, 2025
2
0
1
pve_error.png
my env:
PVE 8.4.1
Supermicro H13SSL-N
CPU:AMD EPYC 9554
RAM: DDR5-4800 ECC 32G*8
OS SSD:Intel P5520 1.92T
truenas cache nvme:4T samsung 980PRO
HDD:WDHC550 16T * 8

I installed the truenas system on pve, version: 25.04.0

Code:
root@big-server:~$ lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda                            8:0    0 14.6T  0 disk
└─sda1                         8:1    0 14.6T  0 part
sdb                            8:16   0 14.6T  0 disk
└─sdb1                         8:17   0 14.6T  0 part
sdc                            8:32   0 14.6T  0 disk
└─sdc1                         8:33   0 14.6T  0 part
sdd                            8:48   0 14.6T  0 disk
└─sdd1                         8:49   0 14.6T  0 part
sde                            8:64   0 14.6T  0 disk
└─sde1                         8:65   0 14.6T  0 part
sdf                            8:80   0 14.6T  0 disk
└─sdf1                         8:81   0 14.6T  0 part
sdg                            8:96   0 14.6T  0 disk
└─sdg1                         8:97   0 14.6T  0 part
sdh                            8:112  0 14.6T  0 disk
└─sdh1                         8:113  0 14.6T  0 part
nvme1n1                      259:0    0  1.7T  0 disk
├─nvme1n1p1                  259:1    0 1007K  0 part
├─nvme1n1p2                  259:2    0    1G  0 part /boot/efi
└─nvme1n1p3                  259:3    0  1.7T  0 part
  ├─pve-swap                 252:0    0    8G  0 lvm  [SWAP]
  ├─pve-root                 252:1    0   96G  0 lvm  /
  ├─pve-data_tmeta           252:2    0 15.9G  0 lvm 
  │ └─pve-data-tpool         252:4    0  1.6T  0 lvm 
  │   ├─pve-data             252:5    0  1.6T  1 lvm 
  │   ├─pve-vm--101--disk--0 252:6    0  100G  0 lvm 
  │   ├─pve-vm--102--disk--0 252:7    0  100G  0 lvm 
  │   ├─pve-vm--100--disk--0 252:8    0  200G  0 lvm 
  │   └─pve-vm--103--disk--0 252:9    0  200G  0 lvm 
  └─pve-data_tdata           252:3    0  1.6T  0 lvm 
    └─pve-data-tpool         252:4    0  1.6T  0 lvm 
      ├─pve-data             252:5    0  1.6T  1 lvm 
      ├─pve-vm--101--disk--0 252:6    0  100G  0 lvm 
      ├─pve-vm--102--disk--0 252:7    0  100G  0 lvm 
      ├─pve-vm--100--disk--0 252:8    0  200G  0 lvm 
      └─pve-vm--103--disk--0 252:9    0  200G  0 lvm 
nvme0n1                      259:4    0  3.6T  0 disk
└─nvme0n1p1                  259:6    0  3.6T  0 part

This is my configuration in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction pcie_aspm=off nvme_core.default_ps_max_latency_us=5500"
I have no any idea about this issues, could you help me have a look on this ?
thanks very much
 
pcie_acs_override=downstream,multifunction
So you had trouble with IOMMU isolation for your desired PCI passthrough & decided to enable Alex Williamson's ACS override patch as shown here in the PVE wiki. However this is not a risk-free solution - as you can see in the linked docs. Most likely this is effecting the PCI controller that is controlling that NVMe that is erroring.

You will probably have to rethink your PCI passthrough strategy with the HW you are using. (BIOS update/settings, PCI slot change etc.).

If everything had been running smoothly for a good period of time (incl. reboots) & only then the above error appeared - you maybe suffering from a different issue.

I installed the truenas system on pve
I'm hoping this means in a VM.


Please note - I don't use similar HW nor does my setup look like yours - I'm only analyzing based on what you have shown. More info may lead to more help.

I ASSUME YOU HAVE FULL RESTORABLE BACKUPS OF ALL YOUR DATA!
 
So you had trouble with IOMMU isolation for your desired PCI passthrough & decided to enable Alex Williamson's ACS override patch as shown here in the PVE wiki. However this is not a risk-free solution - as you can see in the linked docs. Most likely this is effecting the PCI controller that is controlling that NVMe that is erroring.

You will probably have to rethink your PCI passthrough strategy with the HW you are using. (BIOS update/settings, PCI slot change etc.).

If everything had been running smoothly for a good period of time (incl. reboots) & only then the above error appeared - you maybe suffering from a different issue.


I'm hoping this means in a VM.


Please note - I don't use similar HW nor does my setup look like yours - I'm only analyzing based on what you have shown. More info may lead to more help.

I ASSUME YOU HAVE FULL RESTORABLE BACKUPS OF ALL YOUR DATA!
OK, thanks for the reference