Zpools Not Importing After Power Failure

Nov 22, 2020
5
1
3
31
Boston, MA
Hello,

After a power outage, my server is in a state where none of the storage devices are being recognized. Initially, the server would not boot because I believe the zpool cache file was corrupted. I then did the following to get rid of the troublesome cache file:

Code:
mv /etc/zfs/zpool.cache /etc/zfs/zpool.cache.backup


Now, my server boots, but, none of my zpools are loading. I cant even type a simple command like:
Code:
zpool import
because it hangs indefinitely.

The output of dmesg (attached) shows zpool blocking in the kernel and hanging indefinitely with:
Code:
Code: Bad RIP value.


systemctl shows something worrisome when I check the status of "zfs-mount.service":
Code:
● zfs-mount.service - Mount ZFS filesystems
   Loaded: loaded (/lib/systemd/system/zfs-mount.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
Condition: start condition failed at Sun 2020-11-22 01:22:38 EST; 17min ago
           └─ ConditionPathIsDirectory=/sys/module/zfs was not met
     Docs: man:zfs(8)

Any ideas? I'd definitely put a beer money bounty on solving this one! I would love to recover some data in one of my VMs that I hadn't set a proper backup for.
 

Attachments

VERIFY3(range_tree_space(smla->smla_rt) + sme->sme_run <= smla->smla_sm->sm_size) failed PANIC at space_map.c:383:space_map_load_callback()
Those messages appear to indicate that the meta-data/structure of the zpool is broken at a low level. I would not know how to repair that without expert knowledge of the ZFS implementation.
I hope I'm wrong and someone can help you recover from this error...
 
  • Like
Reactions: khernand
does the zfs mount point have directories like dump , template etc?

of so there is a zfs option to fix this, i'll look for it as i used it again a few weeks ago
 
Last edited:
I booted from a live cd and noticed something odd. This server has two pools, "tank" and "fast-tank". I am able to import "tank" with no problems from the live cd. However, when I try to import "fast-tank", I get the same behavior on the live cd of the zpool command hanging indefinitely.

I guess this must be the "fast-tank" zpool being in a corrupt state. I tried running "sudo zpool import -fFX fast-tank" but that hangs indefinitely as well. This also causes any future commands even as simple as "zpool import" to hang indefinitely. I bet this is what is going on when proxmox boots. I'm not sure where to proceed from here...
 
Last edited:
  • Like
Reactions: leesteken
Maybe try -F with -n option when trying to import the broken pool? If all else fails, maybe -X. Check man zpool for more information.
Note that I do not know enough about ZFS and/or your specific situation to determine whether those options will help or make problems worse!
Use your own judgement before trying, or ask an expert.
 
There is a user over here: https://superuser.com/questions/155...c-if-not-imported-as-readonly/1604173#1604173 that seems to be reporting the exact same problem I am facing.

I tried other import options but none seem to be working. I was able to import the pool in readonly mode from another live cd *but* there doesn't seem to be any files present when I "ls /fast-tank". "ls /dev/fast-tank" does show the files but since that is not the mounted pool the files appear to be spread across the 2 drives and I haven't yet tried to recover them by pointing the drives manually (is that even possible?).
 
Update here: I think I've made some progress. I am able to import both pools form my Ubuntu live CD. As stated before, my pool fast-tank is the troublesome one and can only be imported in readonly mode. My output of zpool status is as follows:

Code:
  pool: fast-tank
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:01:30 with 0 errors on Sun Nov  8 05:25:31 2020
config:

    NAME         STATE     READ WRITE CKSUM
    fast-tank    ONLINE       0     0     0
      mirror-0   ONLINE       0     0     0
        nvme0n1  ONLINE       0     0     0
        nvme1n1  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:00:39 with 0 errors on Sun Nov  8 05:24:41 2020
config:

    NAME                                             STATE     READ WRITE CKSUM
    tank                                             ONLINE       0     0     0
      raidz1-0                                       ONLINE       0     0     0
        ata-Samsung_SSD_870_QVO_4TB_S5VYNG0N700101R  ONLINE       0     0     0
        ata-Samsung_SSD_870_QVO_4TB_S5VYNG0N700705Z  ONLINE       0     0     0
        ata-Samsung_SSD_870_QVO_4TB_S5VYNG0N700683J  ONLINE       0     0     0

errors: No known data errors

So it seems there is no known errors as far as zfs can understand. After importing fast-tank, the directory appears empty. However, I see the VMs listed in:

Code:
ubuntu@ubuntu:~$ ls /dev/fast-tank/
vm-100-disk-0        vm-101-disk-0        vm-103-disk-0
vm-100-disk-0-part1  vm-101-disk-0-part1  vm-103-disk-0-part1
vm-100-disk-0-part2  vm-101-disk-0-part5  vm-103-disk-0-part2
vm-100-disk-0-part3  vm-101-disk-0-part6  vm-103-disk-0-part3

These VMs appear to be saved as zvols and I don't understand how to mount them directly to back them up. I would love to recover these VMs. Any advice?
 
I would expect that you can copy the virtual disk to another zpool and start/backup the VMs from there.
Can you make sure not to mount the fast-tank automatically and then mount it read-only on Proxmox? Maybe you can backup the VMs that way without copying everything.
Can you do a scrub on the read-only pool to check for errors? If you (or the scrub itself) can fix the errors and mount the pool read-write, maybe you don't need to copy.