comm systemd-journal: Detected aborted journal - Even on a new cloned nvme

BeastBurst

New Member
Aug 29, 2023
7
0
1
My server spec:

Code:
MB: Supermicro H12SSL-i
CPU: AMD Epyc 7742
GPU: NVIDIA RTX 3090
RAM: 512GB DDR4 RDIMM ECC


Today my Proxmox 8 nded up having the following error:

Code:
[40.294701] EXT-fs error (device dm-1): ext_journal_check_start:84: comm systemd-journal: Detected aborted journal
EXT4-fs (dm-1): Remounting filesystem read-only


This error appears like ~30 seconds after I load proxmox and even I can write my username and password and In a few seconds BOOM I get the error shown above.

First thing that I thought it might be the 2TB samsung NVME is dying on me. I run only one drive in the server. Luckaly I have two brand new nvme from Samsung again and they are even 4TBs. So I downloaded Clonezilla and initiated a clone from the old "suposadly" broken nvme of 2TB to the brand new 4TB nvme. The clone went sucessfully!

So I installed the new 4TB nvme and BOOM, the same error even on the brand new hard drive.
To make the things even more wierder I took that 4TB hard drive and place it on another machine where I had Ubuntu 22.04 installed. I remove the Ubuntu nvme and installed the 4TB cloned Proxmox on the machine and there Proxmox loads and runs fine on my PC and there Proxmox starts without showing me this error, but on the SuperMicro server it does not.

There on the PC which used to run my Ubuntu when I installed the 2TB initial drive from the server for even greater suprise it was runing without any kind of errors the same way the 4TB cloned was running.

So I decided to run test command to see if the disk is healthy and I executed `sudo smartctl -a /dev/nvme0n1` on which I've got:


Code:
SMART overall-health self-assessment test result: PASSED

If that result is acurate the disk should be fine, and why not it runs on my regular PC just fine. Why it does not on my SuperMicro Server ?

To test the server I took the nvme of my Ubuntu PC and place it on the server just to conclude even weirder result, the ubuntu loads in the same nvme slot in which it refuses to load the inital 2TB or the cloned 4TB nvme drives containing Proxmox.

How these strange outcomes can happen, anyone have a clue ?

This setup was running like this for months wihtout problems. What should I do not to make my Proxmox back online without this nasty `comm systemd-journal: Detected aborted journal`, any advice ?
 
Last edited:
Hi,

i have the same problem. Same MB with Hardware Raid. Have you found a solution?
 
Last edited:
i had this error today, too. I occurred after I replaced the motherboard on one of my two node cluster machines. really not sure what's going on. i'm fairly certain the drive is fine.

I should note I do not have anywhere near this spec of hardware. Using consumer grade hardware. 13th gen intel, 32gb memory, 250gb ssd
 
Last edited:
+1 for me. Have been having an identical error as well. takes an hour or two of running proxmox, no running VMs, and then this error appears on the console.
 
it just got me too. Dell r730 no idea where to start. im new to all of this and it happened aftert installing an arc a310 gpu and pcie card nvme adapter. after physical install everything still worked but when i went into bios/lifecontroller/ idrac settings looking to make the nvme's show up i couldnt get past the login without error anymore. no idea what all settings i changed besides the time which i tried to change back. also uninstalled the nvme card but still same error
 
I had the same problem and traced it back to my Ethernet NIC. I don't understand how that could possibly be causing file system trouble, but I'm quite confident the problem happens with the NIC and doesn't happen without it.

The problematic NIC is an IOCrest 8125B PCIe card, bought for cheap off AliExpress. It has model number IO-PCE8125B-GLAN.

I had it installed in anticipation of eventually upgrading my network to 2.5G but it was connected to a 1Gbps switch for now.

My best guess is that the device was causing trouble on the PCIe but somehow, such as a misbehaving interrupt or starving resources needed by the SATA controller.

This was nearly the last thing I tested, having tried replacing SSDs, power supplies, and RAM. The only things left to test were the SATA controller, CPU, motherboard itself, and the NIC.
 
I hate to admit it but the reason I was having this issue was stupidity. While trying to figure out what my 10gtek pci nvme adapter was labeled as to pass through to truenas I accidentally passed through my main os boot drive. Then couldn't get in to switch it back. Ended up having to disable virtualization in bios long enough to boot to proxmox and remove Mt boot drive from truenas vm.


Still haven't gotten nvme adapter working but everything else went right back to normal.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!