Access VM disk after ZFS corruption

erdeidominik99 · May 29, 2023

Hello! I have a proxmox VE with RAID10 ZFS main volume. Unfortunately 2 of the disks were inaccessible due to hardware fail (not the hdd), and the system crashed, and I got a corrupt filesystem. Now I can boot, but the GUI is not accessible, the server is accessible with SSH, but I cannot start the pve-cluster service, because the /var/lib/pve-cluster/config.db file is corrupt. I tried to run scrub whick is not helped and recover the config.db file with no success. If I run echo "pragma integrity_check;" | sqlite3 /var/lib/pve-cluster/config.db I got Error: near line 1: disk I/O error. I can reinstall the system, and I have backup of the VMs except one VM, so I really need to open that VMs filesystem. I tried to mount it with kpartx, but I got failed to stat() /rpool/data/vm-107-disk-0. Is there any way to access the files whick were on that VM? I also tried to run a vzdump for the vm, but it not works, because I got


ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused

.

LnxBil · May 30, 2023

Please post the output of zpool status -v in CODE tags.

erdeidominik99 · May 30, 2023

LnxBil said:
Please post the output of zpool status -v in CODE tags.


pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 01:40:54 with 134 errors on Sun May 28 09:21:48 202                                                                                                                                                             3
config:

        NAME                                 STATE     READ WRITE CKSUM
        rpool                                ONLINE       0     0     0
          mirror-0                           ONLINE       0     0     0
            nvme-eui.0025385711912ba5-part3  ONLINE       0     0     0
            nvme-eui.0025385a11b101d8-part3  ONLINE       0     0     0
          mirror-1                           ONLINE       0     0     0
            nvme-eui.0025385711912baa-part3  ONLINE       0     0    48
            nvme-eui.0025385811b149ae-part3  ONLINE       0     0    48

errors: Permanent errors have been detected in the following files:

        rpool/data/vm-100-disk-0:<0x1>
        rpool/data/vm-107-disk-0:<0x1>
        //var/lib/pve-cluster/config.db-shm
        //var/lib/rrdcached/db/pve2-storage/proxmox/RAID5
        //var/lib/rrdcached/journal/rrd.journal.1685142564.309390
        //var/lib/rrdcached/db/pve2-storage/proxmox/TV
        /rpool/data/subvol-103-disk-0/var/log/pihole/pihole.log
        /rpool/data/subvol-103-disk-0/var/log/pihole/FTL.log
        rpool/data/vm-101-disk-0:<0x1>
        rpool/data/vm-104-disk-1:<0x1>

LnxBil · May 30, 2023

It seems your VMs are gone. Have you checked the URL provided in the output?

erdeidominik99 · May 30, 2023

LnxBil said:
It seems your VMs are gone. Have you checked the URL provided in the output?

Yes, it says that the files are lost, but the interesting thing is that the config.db is readable if I download it I can open in an SQLite manager.

leesteken · May 30, 2023

If you can find out which blocks (8k or 128k) are damaged and you overwrite those with something, then you can read the entire file or VM without getting errors. But it will be corrupted (in parts that were in those blocks), so I don't know how useful it will be. I did something like that once but I can't remember where I found the information on how to find the blocks but it was a lot of manual calculation and trial and error. Maybe something like ddrescue can help you?

LnxBil · May 31, 2023

erdeidominik99 said:
Yes, it says that the files are lost, but the interesting thing is that the config.db is readable if I download it I can open in an SQLite manager.

Yes, the file /var/lib/pve-cluster/config.db-shm is also corrupt, maybe it is also used internally so move the file and try again.

Search

Search

Access VM disk after ZFS corruption

erdeidominik99

Member

LnxBil

Distinguished Member

erdeidominik99

Member

LnxBil

Distinguished Member

erdeidominik99

Member

leesteken

Distinguished Member

LnxBil

Distinguished Member

We value your privacy