systemd-journald[xxx]: Failed to write entry (xx items, xxx bytes), ignoring: Read-only file system

dannybridi

New Member
Sep 10, 2022
12
0
1
Hello,

I have a 3-node cluster running version 7.4-3. For some time now, node #2 stops responding around once a month to the web UI and displays many such lines at the console:
[xxxxxx.xxxxxx] systemd-journald[xxx]: Failed to rotate /var/log/journal/xxxxxxxxxxxxxxxxxxxxxxxxxxxx/system.journal: Read-only file system
[xxxxxx.xxxxxx] systemd-journald[xxx]: Failed to write entry (xx items, xxx bytes), ignoring: Read-only file system
[xxxxxx.xxxxxx] systemd-journald[xxx]: Failed to rotate /var/log/journal/xxxxxxxxxxxxxxxxxxxxxxxxxxxx/system.journal: Read-only file system
[xxxxxx.xxxxxx] systemd-journald[xxx]: Failed to write entry (xx items, xxx bytes), ignoring: Read-only file system
[xxxxxx.xxxxxx] systemd-journald[xxx]: Failed to rotate /var/log/journal/xxxxxxxxxxxxxxxxxxxxxxxxxxxx/system.journal: Read-only file system
[xxxxxx.xxxxxx] systemd-journald[xxx]: Failed to write entry (xx items, xxx bytes), ignoring: Read-only file system
[xxxxxx.xxxxxx] systemd-journald[xxx]: Failed to rotate /var/log/journal/xxxxxxxxxxxxxxxxxxxxxxxxxxxx/system.journal: Read-only file system
[xxxxxx.xxxxxx] systemd-journald[xxx]: Failed to write entry (xx items, xxx bytes), ignoring: Read-only file system
[xxxxxx.xxxxxx] systemd-journald[xxx]: Failed to rotate /var/log/journal/xxxxxxxxxxxxxxxxxxxxxxxxxxxx/system.journal: Read-only file system
[xxxxxx.xxxxxx] systemd-journald[xxx]: Failed to write entry (xx items, xxx bytes), ignoring: Read-only file system

The only thing I can do when this happens is reset the server. The server works fine after reboot (until it happens again). I have also noticed that while restarting, the error message "[FAILED] Failed to start Import ZFS pool Data2" appears for a few seconds but continues and the pool in question is fine (not sure if this is related). I have no idea what is causing this.

Any help would be appreciated.
 
Hello,

May you check the syslog on node #2 looking for the hardware failed or something like that?
if you suspect that the issue is related to ZFS, can you please post or see if there is any hint in the output of `zpool status` or `zfs list`?
 
Hi Moayad,
‘zpool status’ shows that the pool Data2 is online and “no known data errors.”
’zpool list’ shows the name of the pool, Data2, and 6.93TB free space.
I don’t see any errors.
What should I be looking for in the syslog? I unfortunately am not that familiar with it. I looked for the "[FAILED] Failed to start Import ZFS pool Data2" message that I get at boot time, but did not find it.
‘thanks for your help.
 
Hi,

Thank you for the information!

I would like to check the `journalctl -xe` looking for any error messages or warnings that might be related to the file system or storage devices. It's possible that a disk or a partition is failing or has some bad sectors that are causing the file system to become read-only.

You can also check the output of `dmesg` command, looking for kernel logs, or the `smartctl` report related to any interesting message.
 
Hi,
I tried all 3 commands you suggested, but didn’t really find anything. I don’t think it’s a disk issue because I have recently swapped my 4 HDDs (that make up zpool Data2) with SDDs. I did so for speed, but also because I thought it might solve this issue. It didn’t. This is really confusing.
Thanks
 
Happy 2024 everyone!

I’m unfortunately still facing the same issue above. The disks (SSDs) are fine, there’s enough space, and the system is up-to-date. The VMs running on this node continue to respond, but I can no longer manage them from Proxmox. The only thing I can do is physically reset the server. Any ideas?

Thanks
 
Hi maxjules,
Not really. What I ended up doing is install a cron job that pings the default gateway every 5 minutes and reboots the server if there’s no response. The cron job is still there although I think I will remove it next since I believe the server last rebooted 17 days ago when I installed a kernel update. Maybe a recent update fixed the issue? I don’t really know.
Thanks
 
Open a shell on the affected node/server and run “crontab -e” (without the quotes) to edit the crontab file.
Add the following line (replacing the IP address with your gateway’s address):
*/5 * * * * /bin/ping -c 1 -W 5 192.168.0.254 || /sbin/reboot

Hope this helps
 
Last edited:
I have the same issue and don't know what to do.
I also miss all my running VMs on cluster with this messages.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!