Access VM disk after ZFS corruption

Mar 2, 2021
17
1
8
25
Hello! I have a proxmox VE with RAID10 ZFS main volume. Unfortunately 2 of the disks were inaccessible due to hardware fail (not the hdd), and the system crashed, and I got a corrupt filesystem. Now I can boot, but the GUI is not accessible, the server is accessible with SSH, but I cannot start the pve-cluster service, because the /var/lib/pve-cluster/config.db file is corrupt. I tried to run scrub whick is not helped and recover the config.db file with no success. If I run echo "pragma integrity_check;" | sqlite3 /var/lib/pve-cluster/config.db I got Error: near line 1: disk I/O error. I can reinstall the system, and I have backup of the VMs except one VM, so I really need to open that VMs filesystem. I tried to mount it with kpartx, but I got failed to stat() /rpool/data/vm-107-disk-0. Is there any way to access the files whick were on that VM? I also tried to run a vzdump for the vm, but it not works, because I got
ipcc_send_rec[1] failed: Connection refused ipcc_send_rec[2] failed: Connection refused ipcc_send_rec[3] failed: Connection refused Unable to load access control list: Connection refused .
 
Please post the output of zpool status -v in CODE tags.

pool: rpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 01:40:54 with 134 errors on Sun May 28 09:21:48 202 3 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 nvme-eui.0025385711912ba5-part3 ONLINE 0 0 0 nvme-eui.0025385a11b101d8-part3 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 nvme-eui.0025385711912baa-part3 ONLINE 0 0 48 nvme-eui.0025385811b149ae-part3 ONLINE 0 0 48 errors: Permanent errors have been detected in the following files: rpool/data/vm-100-disk-0:<0x1> rpool/data/vm-107-disk-0:<0x1> //var/lib/pve-cluster/config.db-shm //var/lib/rrdcached/db/pve2-storage/proxmox/RAID5 //var/lib/rrdcached/journal/rrd.journal.1685142564.309390 //var/lib/rrdcached/db/pve2-storage/proxmox/TV /rpool/data/subvol-103-disk-0/var/log/pihole/pihole.log /rpool/data/subvol-103-disk-0/var/log/pihole/FTL.log rpool/data/vm-101-disk-0:<0x1> rpool/data/vm-104-disk-1:<0x1>
 
If you can find out which blocks (8k or 128k) are damaged and you overwrite those with something, then you can read the entire file or VM without getting errors. But it will be corrupted (in parts that were in those blocks), so I don't know how useful it will be. I did something like that once but I can't remember where I found the information on how to find the blocks but it was a lot of manual calculation and trial and error. Maybe something like ddrescue can help you?
 
  • Like
Reactions: LnxBil
Yes, it says that the files are lost, but the interesting thing is that the config.db is readable if I download it I can open in an SQLite manager.
Yes, the file /var/lib/pve-cluster/config.db-shm is also corrupt, maybe it is also used internally so move the file and try again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!