ZFS Mirror rebuild

Aug 24, 2021
24
2
8
57
Hi,
I've had some interesting issues this last week, things now mostly ok but I want to ask advice on this-

Proxmox installed on a ZFS mirror
Had RAM issues and Proxmox now will not boot

Can't scrub the zpool because it was created on another system.

I've installed Proxmox on another drive, but the zpool won't import because

root@pve02:~# zpool import -d /dev rpool cannot import 'rpool': pool was previously in use from another system. Last accessed by (none) (hostid=fd84ad1) at Mon May 9 20:09:38 2022 The pool can be imported, use 'zpool import -f' to import the pool.

and it causes a KP when I try to 'zpool import -f' as suggested-

Message from syslogd@pve02 at May 12 10:24:01 ... kernel:[ 743.768395] PANIC: zfs: adding existent segment to range tree (offset=9f9731000 size=be001000)

Quite happy to blow it away and start again, but I would like to mount at least one of the drives to recover a VM image

But-

root@pve02:~# zpool import -o readonly=on rpool cannot import 'rpool': pool was previously in use from another system. Last accessed by (none) (hostid=fd84ad1) at Mon May 9 20:09:38 2022 The pool can be imported, use 'zpool import -f' to import the pool.


Can I either-
1. remove the boot partitions from the ZFS mirror disks, fix the raid and mount the drive?
or
2. re-install Proxmox on the ZFS mirror to fix the boot issue, without wiping the ZFS mirror disks?

It just seems dumb that I can't mount a drive and pull a file off it...
 
that message sounds like there is some on-disk corruption as a result of your RAM issues..

the usual order of actions would be:
- FIRST AND IMPORTANT take full copies of your disks (with dd/ddrescue/..) so that you can attempt recovery but rollback to the original broken state

then in order of complexity/effort and chances of working:
- attempt to import going back a few transactions (might skip the broken part if it was only recently written, but will lose a bit of recently written data)
- attempt dumping contents using zdb
- write custom/patched versions of ZFS modules or zdb skipping certain checks
 
Thanks @fabian
This is partly a learning exercise- no one will die if the data is unrecoverable, but I am learning so much about what does and doesn't work.

I have been able to mount the zfs mirror as 'read only' by forcing it- now I'm trying to figure out how to move the VMs off the mirror and on to new storage - I installed Proxmox on another NVMe drive.

so far I have -

cp /dev/zvol/rpool/data/vm-103-disk-1 /mnt/pve02/local-lvm/vm-103-disk-1

I'm sure that's not 100% correct, but getting there
 
a zvol is a raw blockdevice, so you can just put that into a file with dd or similar tools, and then use qm importdisk to import it into a VM once your PVE is up and running again.
 
Thanks @fabian
Sorry for sounding dumb, but my 'read-only' rpool is as pictured, what would I do to copy them off the rpool and into the new disk?
Guessing I can use
/var/lib/vz/images/
for this.
 

Attachments

  • Screen Shot 2022-05-12 at 8.08.47 pm.png
    Screen Shot 2022-05-12 at 8.08.47 pm.png
    21.4 KB · Views: 7
e.g., if you have space and can write to /var/lib/vz/ :

Code:
mkdir /var/lib/vz/rpool-backup
dd if=/dev/zvol/rpool/data/vm-100-disk-0 of=/var/lib/vz/rpool-backup/vm-100-disk-0.raw bs=1M status=progress

you need to repeat the dd command for every disk you want to save (and adapt the input (if) and output (of) paths accordingly). you don't need to backup the -partX block devices separately, those are just the individual partitions and are contained in the full zvol.

double check the output and the saved files (sizes should be as expected for example). you can also run md5sum on input and output, they should produce the same checksum since its a bitwise copy.