Can't boot PVE after multiple ungraceful shutdown

BoberMod

New Member
Sep 23, 2023
1
0
1
Ukraine
I had a lot of ungrateful shutdowns because of regular power outages in my country and didn't notice AC Power Recovery was enabled.

When I tried to boot a machine, I faced with fsck error - Bad magic number in super-block . After e2fsck (with force:confused:) this error disappeared, but boot stuck on /dev/mapper/pve-root: clean line. It doesn't boot further, even in rescue mode (from both installed system and LiveCD). And there are no visible errors during any boot.
I also re-created /dev/sda1 which is a boot partition using this wiki guide. Grub works correctly.

smartctl (both long and short) returned 0 errors, checks passed and all important metrics are ok.

I loaded into PVE LiveCD in debug mode and got logs.

Parts of journalctl log:
Code:
Jun 24 02:02:46 pve7070 systemd[1]: Finished systemd-remount-fs.service - Remount Root and Kernel File Systems.
Jun 24 02:02:46 pve7070 kernel: EXT4-fs (dm-1): re-mounted cb8141d8-15a0-4d37-a0e8-87cbc40a2313 r/w. Quota mode: none.
Jun 24 02:02:46 pve7070 kernel: EXT4-fs (dm-1): warning: mounting fs with errors, running e2fsck is recommended
Jun 24 02:02:46 pve7070 kernel: spl: loading out-of-tree module taints kernel.
Jun 24 02:02:46 pve7070 systemd[1]: Finished modprobe@drm.service - Load Kernel Module DRM.

Code:
Jun 24 02:02:46 pve7070 systemd[1]: proc-sys-fs-binfmt_misc.automount: Failed to open /lib/systemd/system/proc-sys-fs-binfmt_misc.automount: Bad message
Jun 24 02:02:46 pve7070 systemd[1]: rc-local.service: Failed to open /lib/systemd/system/rc-local.service: Bad message
Jun 24 02:02:46 pve7070 systemd[1]: rpcbind.target: Failed to open/lib/systemd/system/rpcbind.target: Bad message
Jun 24 02:02:46 pve7070 systemd[1]: remote-fs.target: Failed to open /lib/systemd/system/remote-fs.target: Bad message
Jun 24 02:02:46 pve7070 systemd[1]: remote-fs-pre.target: Failed to open /lib/systemd/system/remote-fs-pre.target: Bad message
Jun 24 02:02:46 pve7070 systemd[1]: rescue.service: Failed to open /lib/systemd/system/rescue.target: Bad message
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421491: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421490: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421490: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421480: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421495: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421494: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421499: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421499: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421491: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 systemd[1]: Failed to resolve symlink /usr/lib/systemd/system/runlevel1.target pointing to /lib/systemd/system/rescue.target: Bad message
Jun 24 02:02:46 pve7070 systemd[1]: Failed to resolve symlink /usr/lib/systemd/system/runlevel1.target pointing to /lib/systemd/system/rescue.target: Bad message
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421502: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 kernel: EXT4-fs error (device dm-1): ext4_lookup:1855: inode #421502: comm systemd: iget: checksum invalid
Jun 24 02:02:46 pve7070 systemd[1]: Hostname set to <pve7070>.
Jun 24 02:02:46 pve7070 systemd[1]: Detected architecture x86-64.
Jun 24 02:02:46 pve7070 systemd[1]:systemd 252.22-1-debian1 un system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SEC INIT +SECCOMP +GCRYPT +GNU TLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FID 02 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIB
Jun 24 02:02:46 pve7070 systemd[1]: Inserted module 'autofs4'
Jun 24 02:02:46 pve7070 kernel: EXT4-fs (dm-1): mounted filesystem cb8141d8-15a0-4d37-a0e8-87cbc40a2313 ro with ordered data mode. Quota mode: none.
Jun 24 02:02:46 pve7070 kernel: Btrfs loaded, zoned=yes, fsverity=yes

fsck:
Code:
root@proxmox:~# fsck -vcf /dev/mapper/pve-root

fsck from util-linux 2.38.1
e2fsck 1.47.0 (5-Feb-2023)
Checking for bad blocks (read-only test): done
/dev/mapper/pve-root: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/mapper/pve-root: ***** FILE SYSTEM WAS MODIFIED *****
80523 inodes used (1.28%, out of 6291456)
130 non-contiguous files (0.2%)
90 non-contiguous directories (0.1%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 75275/76
2425151 blocks used (9.64%, out of 25165824)
9 bad blocks
3 large files

65761 regular files
9491 directories
10 character device files
0 block device files
0 fifos
517 links
522 symbolic links (5123 fast symbolic links)
30 sockets
81011 files

Please help, is there any way to restore the system without a complete reinstallation?
If some logs are missing - please ask, because I'm not sure what else is needed.
 
Consumer SSDs are prone to data corruption on unexpected power loss, more so than old HDDs. Looks like you have (silent) filesystem corruption and there is not way to check for this unless you use a filesystem with checksums like ZFS or Btrfs. I don't see how you could every trust the filesystem again without reinstalling (and restoring the CT/VMs from know good backups).
 
You broke it, you can keep the pieces ...

If fsck can not fix it, saving your data and doing a new install is probably your best option

ALWAYS have backups, use PBS
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!