Hi,
I have Proxmox installed and it seems to be every 6 or 7 days the file system goes readonly and i cant work out why.
Looking at syslog i see the following when it happens - prior to this the system is perfect.
As the system is readonly i ran an fsck on pve-root which detected issues:-
As you can see lots or errors on the root volume.
However, if i run as smartctl on my drive it passes with no issues:-
Finally after looking around i've seen a few mentions of this being caused by low disk space. While i didnt get this prior to running fsck and rebooting, after i rebooted and it came back up post fsck i can see the following:-
So no disk appears to be full here, at least not after a reboot.
Can anyone help me out here - i would suspect a failing drive if not for the smart tests failing and the consistant working for a week before it fails making me thing something is triggering this.
I have Proxmox installed and it seems to be every 6 or 7 days the file system goes readonly and i cant work out why.
Looking at syslog i see the following when it happens - prior to this the system is perfect.
Code:
Jan 30 00:39:53 proxmox systemd[1]: Starting Discard unused blocks on filesystems from /etc/fstab...
Jan 30 00:39:54 proxmox systemd[1]: fstrim.service: Main process exited, code=exited, status=32/n/a
Jan 30 00:39:54 proxmox fstrim[2136896]: fstrim: /: FITRIM ioctl failed: Bad message
Jan 30 00:39:54 proxmox systemd[1]: fstrim.service: Failed with result 'exit-code'.
Jan 30 00:39:54 proxmox systemd[1]: Failed to start Discard unused blocks on filesystems from /etc/fstab.
Jan 30 00:39:55 proxmox pvestatd[1002]: can't lock file '/var/log/pve/tasks/.active.lock' - can't open file - Read-only file system
Jan 30 00:40:00 proxmox kernel: [623045.863680] EXT4-fs error (device loop0): ext4_journal_check_start:83: comm database: Detected aborted journal
Jan 30 00:40:00 proxmox kernel: [623045.863827] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
Jan 30 00:40:00 proxmox kernel: [623045.863837] Buffer I/O error on dev loop0, logical block 0, lost sync page write
Jan 30 00:40:00 proxmox kernel: [623045.863851] EXT4-fs (loop0): I/O error while writing superblock
Jan 30 00:40:00 proxmox kernel: [623045.863858] EXT4-fs (loop0): Remounting filesystem read-only
Jan 30 00:40:03 proxmox pvescheduler[2136939]: replication: can't lock file '/var/lib/pve-manager/pve-replication-state.lck' - can't open file - Read->
Jan 30 00:40:05 proxmox pvestatd[1002]: can't lock file '/var/log/pve/tasks/.active.lock' - can't open file - Read-only file system
Jan 30 00:40:15 proxmox pvestatd[1002]: can't lock file '/var/log/pve/tasks/.active.lock' - can't open file - Read-only file system
Jan 30 00:40:25 proxmox pvestatd[1002]: can't lock file '/var/log/pve/tasks/.active.lock' - can't open file - Read-only file system
As the system is readonly i ran an fsck on pve-root which detected issues:-
Code:
Running fsck.ext4 -fp /dev/mapper/pve-root
<snip>
JBD2: Invalid checksum recovering data block 524525 in log
JBD2: Invalid checksum recovering data block 524531 in log
JBD2: Invalid checksum recovering data block 524531 in log
JBD2: Invalid checksum recovering data block 524445 in log
JBD2: Invalid checksum recovering data block 524520 in log
JBD2: Invalid checksum recovering data block 524539 in log
JBD2: Invalid checksum recovering data block 524331 in log
JBD2: Invalid checksum recovering data block 524541 in log
JBD2: Invalid checksum recovering data block 4718870 in log
JBD2: Invalid checksum recovering data block 8912949 in log
JBD2: Invalid checksum recovering data block 0 in log
JBD2: Invalid checksum recovering data block 524531 in log
JBD2: Invalid checksum recovering data block 524531 in log
Journal checksum error found in /dev/mapper/pve-root
/dev/mapper/pve-root: Inode 131266, i_blocks is 608, should be 288. FIXED.
/dev/mapper/pve-root: Inode 134094 extent tree (at level 1) could be shorter. IGNORED.
/dev/mapper/pve-root: Inode 134094, i_blocks is 672, should be 8. FIXED.
/dev/mapper/pve-root: Deleted inode 1196137 has zero dtime. FIXED.
/dev/mapper/pve-root: Inodes that were part of a corrupted orphan linked list found.
/dev/mapper/pve-root: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
root@proxmox:~# fsck.ext4 -f /dev/mapper/pve-root
e2fsck 1.46.5 (30-Dec-2021)
Pass 1: Checking inodes, blocks, and sizes
Inode 134094 extent tree (at level 1) could be shorter. Optimize<y>? yes
Inodes that were part of a corrupted orphan linked list found. Fix<y>? yes
Inode 2490382 was part of the orphaned inode list. FIXED.
Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Entry 'pve-replication-state.json' in /var/lib/pve-manager (133707) has deleted/unused inode 134360. Clear<y>? yes
Entry 'rrd.journal.1675038983.759590' in /var/lib/rrdcached/journal (134284) has deleted/unused inode 134360. Clear<y>? yes
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 134017 ref count is 1, should be 2. Fix<y>? yes
Unattached zero-length inode 134094. Clear<y>? yes
Pass 5: Checking group summary information
Block bitmap differences: -(649800--649803) -649805 -649825 -(649828--649829) -649840 +(649847--649848) +(649851--649858) +(649863--649868) -(1043968--1044120) -(3622960--3623007) +(3623412--3623426) -(3623488--3623506) +4751570 +(4751597--4751600) +4751641 -(8396832--8397606) -(8397608--8397879) -(10305985--10305989) -10305999 +10306009 -10306014 +10306029 +(10310205--10310208) -(11075712--11075722) -(12025856--12025901) -12025936 -12025947 -(12025949--12026086) -12026088 -(12026112--12026260) -(12026272--12026295) -(12026304--12026322) -(12026336--12026352) -(12026368--12027903) -(12028801--12028927) -(12029591--12058623) -12512126 -12512128 -(12512130--12512131) -(12512154--12512157) -12512164 -(17240347--17240348) -17240350 -17240352 +17240354 -17240373 +(17240420--17240427) +(17240432--17240474) +17240476
Fix<y>? yes
Free blocks count wrong for group #19 (9664, counted=9665).
Fix<y>? yes
Free blocks count wrong for group #31 (5361, counted=5514).
Fix<y>? yes
Free blocks count wrong for group #110 (3571, counted=3666).
Fix ('a' enables 'yes' to all) <y>? yes
Free blocks count wrong for group #145 (32709, counted=32716).
Fix ('a' enables 'yes' to all) <y>? yes
Free blocks count wrong for group #164 (10551, counted=10552).
Fix ('a' enables 'yes' to all) <y>? yes
Free blocks count wrong for group #314 (22634, counted=22677).
Fix ('a' enables 'yes' to all) <y>? yes
Free blocks count wrong for group #338 (32635, counted=32640).
Fix<y>? yes
Free blocks count wrong for group #381 (6537, counted=6546).
Fix<y>? yes
Free blocks count wrong for group #526 (4978, counted=4986).
Fix<y>? yes
Free blocks count wrong (15432177, counted=15344951).
Fix<y>? yes
Inode bitmap differences: -134360 -1196137
Fix<y>? yes
Free inodes count wrong for group #16 (4570, counted=4572).
Fix<y>? yes
Free inodes count wrong for group #146 (17, counted=18).
Fix<y>? yes
Free inodes count wrong (4229066, counted=4229005).
Fix<y>? yes
/dev/mapper/pve-root: ***** FILE SYSTEM WAS MODIFIED *****
/dev/mapper/pve-root: ***** REBOOT SYSTEM *****
/dev/mapper/pve-root: 88179/4317184 files (0.2% non-contiguous), 1896137/17241088 blocks
root@proxmox:~# fsck.ext4 -f /dev/mapper/pve-root
e2fsck 1.46.5 (30-Dec-2021)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/pve-root: 88179/4317184 files (0.2% non-contiguous), 1896137/17241088 blocks
As you can see lots or errors on the root volume.
However, if i run as smartctl on my drive it passes with no issues:-
Code:
root@proxmox:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 4G 0 loop
loop1 7:1 0 12G 0 loop
sda 8:0 0 223.6G 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part
└─sda3 8:3 0 223.1G 0 part
├─pve-swap 253:0 0 8G 0 lvm [SWAP]
├─pve-root 253:1 0 65.8G 0 lvm /
├─pve-data_tmeta 253:2 0 1.3G 0 lvm
│ └─pve-data-tpool 253:4 0 130.6G 0 lvm
│ ├─pve-data 253:5 0 130.6G 1 lvm
│ └─pve-vm--100--disk--0 253:6 0 16G 0 lvm
└─pve-data_tdata 253:3 0 130.6G 0 lvm
└─pve-data-tpool 253:4 0 130.6G 0 lvm
├─pve-data 253:5 0 130.6G 1 lvm
└─pve-vm--100--disk--0 253:6 0 16G 0 lvm
sdb 8:16 0 14.6T 0 disk /mnt/USBData
mmcblk0 179:0 0 7.3G 0 disk
mmcblk0boot0 179:8 0 4M 1 disk
mmcblk0boot1 179:16 0 4M 1 disk
root@proxmox:~# smartctl -H /dev/sda3
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.83-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Finally after looking around i've seen a few mentions of this being caused by low disk space. While i didnt get this prior to running fsck and rebooting, after i rebooted and it came back up post fsck i can see the following:-
Code:
root@proxmox:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 7.8G 0 7.8G 0% /dev
tmpfs 1.6G 2.5M 1.6G 1% /run
/dev/mapper/pve-root 65G 5.7G 56G 10% /
tmpfs 7.8G 46M 7.7G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sdb 15T 6.6T 7.2T 49% /mnt/USBData
/dev/fuse 128M 16K 128M 1% /etc/pve
tmpfs 1.6G 0 1.6G 0% /run/user/0
So no disk appears to be full here, at least not after a reboot.
Can anyone help me out here - i would suspect a failing drive if not for the smart tests failing and the consistant working for a week before it fails making me thing something is triggering this.
Last edited: