We've been struggling for a couple of days with an undetermined issue with corrupted filesystem on our root (LVM /dev/pve/root) and data (LVM /dev/data/home).
Suddenly started reporting EXT4 errors till the system became unstable.
We then from a rescue disk tested the disks with badblocks and another couple of utilities without errors. Linux raid behind the LVM reports no errors either.
We deleted the partitions, reformat and restored backups. Tested filesystem without any issues.
Rescue disk is an older PVE install with older kernel.
After rebooting on to the system we keep getting EXT4 errors on the filesystem again and the partitions prove to slowly becoming unstable again and we can't figure out what to test. Most containers were blocked as if they were in the middle of a snapshot.
The kernel error again:
Our system is using NVMe PCI4 disks on a X570 Asus Motheboard which we don't know if might be related to the issue or it might be related to a recent kernel update.
We will be trying to boot previous kernel to see if it might be related but any assistance in possible forms of debugging
Suddenly started reporting EXT4 errors till the system became unstable.
Code:
Nov 30 00:05:52 e20home kernel: [543969.097160] EXT4-fs error (device dm-28): __ext4_find_entry:1542: inode #807352: comm gunicorn: checksumming dir
ectory block 0
Nov 30 00:06:52 e20home kernel: [544029.097195] EXT4-fs warning (device dm-28): ext4_dirblock_csum_verify:370: inode #807352: comm gunicorn: No spac
e for directory leaf checksum. Please run e2fsck -D.
Nov 30 00:07:52 e20home kernel: [544089.097374] EXT4-fs warning (device dm-28): ext4_dirblock_csum_verify:370: inode #807352: comm gunicorn: No spac
e for directory leaf checksum. Please run e2fsck -D.
Nov 30 00:08:52 e20home kernel: [544149.097218] EXT4-fs error (device dm-28): __ext4_find_entry:1542: inode #807352: comm gunicorn: checksumming dir
ectory block 0
Nov 30 00:09:52 e20home kernel: [544209.096784] EXT4-fs warning (device dm-28): ext4_dirblock_csum_verify:370: inode #807352: comm gunicorn: No spac
e for directory leaf checksum. Please run e2fsck -D.
Nov 30 00:10:52 e20home kernel: [544269.097161] EXT4-fs error (device dm-28): __ext4_find_entry:1542: inode #807352: comm gunicorn: checksumming dir
ectory block 0
Nov 30 00:11:52 e20home kernel: [544329.096723] EXT4-fs warning (device dm-28): ext4_dirblock_csum_verify:370: inode #807352: comm gunicorn: No spac
e for directory leaf checksum. Please run e2fsck -D.
Nov 30 00:12:52 e20home kernel: [544389.097332] EXT4-fs error (device dm-28): __ext4_find_entry:1542: inode #807352: comm gunicorn: checksumming dir
ectory block 0
Nov 30 00:13:52 e20home kernel: [544449.096743] EXT4-fs error (device dm-28): __ext4_find_entry:1542: inode #807352: comm gunicorn: checksumming dir
ectory block 0
Nov 30 00:14:52 e20home kernel: [544509.097187] EXT4-fs error (device dm-28): __ext4_find_entry:1542: inode #807352: comm gunicorn: checksumming dir
ectory block 0
Nov 30 00:17:52 e20home kernel: [544689.097692] EXT4-fs warning (device dm-28): ext4_dirblock_csum_verify:370: inode #807352: comm gunicorn: No spac
e for directory leaf checksum. Please run e2fsck -D.
Nov 30 00:18:52 e20home kernel: [544749.097133] EXT4-fs warning (device dm-28): ext4_dirblock_csum_verify:370: inode #807352: comm gunicorn: No spac
e for directory leaf checksum. Please run e2fsck -D.
Nov 30 00:19:52 e20home kernel: [544809.096933] EXT4-fs warning (device dm-28): ext4_dirblock_csum_verify:370: inode #807352: comm gunicorn: No spac
e for directory leaf checksum. Please run e2fsck -D.
Nov 30 00:27:52 e20home kernel: [545289.096616] EXT4-fs warning (device dm-28): ext4_dirblock_csum_verify:370: inode #807352: comm gunicorn: No spac
e for directory leaf checksum. Please run e2fsck -D.
We then from a rescue disk tested the disks with badblocks and another couple of utilities without errors. Linux raid behind the LVM reports no errors either.
We deleted the partitions, reformat and restored backups. Tested filesystem without any issues.
Rescue disk is an older PVE install with older kernel.
After rebooting on to the system we keep getting EXT4 errors on the filesystem again and the partitions prove to slowly becoming unstable again and we can't figure out what to test. Most containers were blocked as if they were in the middle of a snapshot.
The kernel error again:
Code:
Dec 1 17:33:02 e20home kernel: [ 1072.358923] EXT4-fs warning (device dm-28): ext4_dirblock_csum_verify:370: inode #2884850: comm gitaly: No space for directory leaf checksum. Please run e2fsck -D.
Dec 1 17:30:42 e20home kernel: [ 932.100043] EXT4-fs error (device dm-28): __ext4_find_entry:1541: inode #2752514: comm mysqld: checksumming directory block 0
Dec 1 17:32:45 e20home kernel: [ 1054.920811] EXT4-fs error (device dm-28): __ext4_find_entry:1541: inode #2884850: comm gitaly: checksumming directory block 0
Dec 1 17:32:56 e20home kernel: [ 1065.912091] vmbr1: received packet on bond0 with own address as source address (addr:34:97:f6:31:82:13, vlan:0)
Dec 1 17:33:02 e20home kernel: [ 1072.358923] EXT4-fs warning (device dm-28): ext4_dirblock_csum_verify:370: inode #2884850: comm gitaly: No space for directory leaf checksum. Please run e2fsck -D.
Dec 1 17:41:23 e20home kernel: [ 1573.057286] EXT4-fs error (device dm-28): __ext4_find_entry:1541: inode #2884850: comm gitaly: checksumming directory block 0
Dec 1 17:49:03 e20home kernel: [ 2033.474292] EXT4-fs error (device dm-28): __ext4_find_entry:1541: inode #2884850: comm gitaly: checksumming directory block 0
Our system is using NVMe PCI4 disks on a X570 Asus Motheboard which we don't know if might be related to the issue or it might be related to a recent kernel update.
We will be trying to boot previous kernel to see if it might be related but any assistance in possible forms of debugging
Code:
pveversion -v
proxmox-ve: 6.2-2 (running kernel: 5.3.10-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15/48bd51b6)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-4
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-10
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-10
pve-cluster: 6.2-1
pve-container: 3.2-4
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-6
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-20
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1