proxmox 8.4.1 crashed:EXT4-fs error (device dm-1):ext4_journal_check_start xxx error

vitaminsss · Jul 8, 2025

my env:
PVE 8.4.1
Supermicro H13SSL-N
CPU：AMD EPYC 9554
RAM： DDR5-4800 ECC 32G*8
OS SSD：Intel P5520 1.92T
truenas cache nvme：4T samsung 980PRO
HDD：WDHC550 16T * 8

I installed the truenas system on pve, version: 25.04.0

Code:

root@big-server:~$ lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda                            8:0    0 14.6T  0 disk
└─sda1                         8:1    0 14.6T  0 part
sdb                            8:16   0 14.6T  0 disk
└─sdb1                         8:17   0 14.6T  0 part
sdc                            8:32   0 14.6T  0 disk
└─sdc1                         8:33   0 14.6T  0 part
sdd                            8:48   0 14.6T  0 disk
└─sdd1                         8:49   0 14.6T  0 part
sde                            8:64   0 14.6T  0 disk
└─sde1                         8:65   0 14.6T  0 part
sdf                            8:80   0 14.6T  0 disk
└─sdf1                         8:81   0 14.6T  0 part
sdg                            8:96   0 14.6T  0 disk
└─sdg1                         8:97   0 14.6T  0 part
sdh                            8:112  0 14.6T  0 disk
└─sdh1                         8:113  0 14.6T  0 part
nvme1n1                      259:0    0  1.7T  0 disk
├─nvme1n1p1                  259:1    0 1007K  0 part
├─nvme1n1p2                  259:2    0    1G  0 part /boot/efi
└─nvme1n1p3                  259:3    0  1.7T  0 part
  ├─pve-swap                 252:0    0    8G  0 lvm  [SWAP]
  ├─pve-root                 252:1    0   96G  0 lvm  /
  ├─pve-data_tmeta           252:2    0 15.9G  0 lvm 
  │ └─pve-data-tpool         252:4    0  1.6T  0 lvm 
  │   ├─pve-data             252:5    0  1.6T  1 lvm 
  │   ├─pve-vm--101--disk--0 252:6    0  100G  0 lvm 
  │   ├─pve-vm--102--disk--0 252:7    0  100G  0 lvm 
  │   ├─pve-vm--100--disk--0 252:8    0  200G  0 lvm 
  │   └─pve-vm--103--disk--0 252:9    0  200G  0 lvm 
  └─pve-data_tdata           252:3    0  1.6T  0 lvm 
    └─pve-data-tpool         252:4    0  1.6T  0 lvm 
      ├─pve-data             252:5    0  1.6T  1 lvm 
      ├─pve-vm--101--disk--0 252:6    0  100G  0 lvm 
      ├─pve-vm--102--disk--0 252:7    0  100G  0 lvm 
      ├─pve-vm--100--disk--0 252:8    0  200G  0 lvm 
      └─pve-vm--103--disk--0 252:9    0  200G  0 lvm 
nvme0n1                      259:4    0  3.6T  0 disk
└─nvme0n1p1                  259:6    0  3.6T  0 part

This is my configuration in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction pcie_aspm=off nvme_core.default_ps_max_latency_us=5500"
I have no any idea about this issues, could you help me have a look on this ?
thanks very much

gfngfn256 · Jul 8, 2025

vitaminsss said:
pcie_acs_override=downstream,multifunction

So you had trouble with IOMMU isolation for your desired PCI passthrough & decided to enable Alex Williamson's ACS override patch as shown here in the PVE wiki. However this is not a risk-free solution - as you can see in the linked docs. Most likely this is effecting the PCI controller that is controlling that NVMe that is erroring.

You will probably have to rethink your PCI passthrough strategy with the HW you are using. (BIOS update/settings, PCI slot change etc.).

If everything had been running smoothly for a good period of time (incl. reboots) & only then the above error appeared - you maybe suffering from a different issue.

vitaminsss said:
I installed the truenas system on pve

I'm hoping this means in a VM.

Please note - I don't use similar HW nor does my setup look like yours - I'm only analyzing based on what you have shown. More info may lead to more help.

I ASSUME YOU HAVE FULL RESTORABLE BACKUPS OF ALL YOUR DATA!

vitaminsss · Jul 8, 2025

gfngfn256 said:
So you had trouble with IOMMU isolation for your desired PCI passthrough & decided to enable Alex Williamson's ACS override patch as shown here in the PVE wiki. However this is not a risk-free solution - as you can see in the linked docs. Most likely this is effecting the PCI controller that is controlling that NVMe that is erroring.

You will probably have to rethink your PCI passthrough strategy with the HW you are using. (BIOS update/settings, PCI slot change etc.).

If everything had been running smoothly for a good period of time (incl. reboots) & only then the above error appeared - you maybe suffering from a different issue.

I'm hoping this means in a VM.

Please note - I don't use similar HW nor does my setup look like yours - I'm only analyzing based on what you have shown. More info may lead to more help.

I ASSUME YOU HAVE FULL RESTORABLE BACKUPS OF ALL YOUR DATA!

OK, thanks for the reference

Search

Search

proxmox 8.4.1 crashed:EXT4-fs error (device dm-1):ext4_journal_check_start xxx error

vitaminsss

New Member

gfngfn256

Distinguished Member

vitaminsss

New Member

We value your privacy