failed Import ZFS pool

PaulVM

Renowned Member
May 24, 2011
102
3
83
Playing with a test cluster.
After disconnecting one node to test HA migration (VMs replication active), I have some VMs corrupted (hangs on boot like the disks are corrupted).


From systemctl:
zfs-import@zp01.service loaded failed failed Import ZFS pool zp01
zfs-import@zpha.service loaded failed failed Import ZFS pool zpha

Code:
# zpool status -v
  pool: zp01
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        zp01        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            vdd     ONLINE       0     0     0
            vde     ONLINE       0     0     0

errors: No known data errors

  pool: zpha
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        zpha        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            vdb     ONLINE       0     0     0
            vdc     ONLINE       0     0     0

errors: No known data errors

How can I diagnose the problem cause and fix it?

Thanks, P.
 
Last edited:
After disconnecting one node to test HA migration (VMs replication active)
That is no HA migration, it's failover. A migration normally has no data loss, failover usually has.

Yet this should not corrupt the bootdisk. Does it work if you migrate the VM?
 
I am able to migrate a couple of VMs, but if I try a CT, either running or switched off, I have:

trying to acquire lock... TASK ERROR: can't lock file '/var/lock/pve-manager/pve-migrate-41200' - got timeout

I simply activated replication of the VMs/CTs to the other nodes and added the VMs/CTs to the HA resouces.
In my first bunch of tests, after some replication problems, solved rebooting all nodes, the cluster was working as expected and when I disconnected a node, it rebooted and fenced it and migrate (restarted), the VMs/CTs into other nodes
Then, I recreated the cluster, the ZFS pools and restored VMs/CTs. This time I have the reported problems. All nodes are in the same situation (failed Import ZFS pools).
I can simply restart with a new cluster, but I am interested to learn how to solve this kind of problems to be ready if it happens in a real cluster.
Any hints appreciated.

Thanks, P.
 
Last edited:
Same results in production servers with last updates:
# pveversion pve-manager/8.0.9/fd1a0ae1b385cdcd (running kernel: 6.2.16-19-pve)

If can be useful, form journalctl -r I have:
Nov 19 12:44:58 pveprod01 systemd[1]: Reached target zfs-volumes.target - ZFS volumes are ready. Nov 19 12:44:58 pveprod01 systemd[1]: Reached target local-fs.target - Local File Systems. Nov 19 12:44:58 pveprod01 systemd[1]: Finished zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev. Nov 19 12:44:58 pveprod01 systemd[1]: Finished zfs-mount.service - Mount ZFS filesystems. Nov 19 12:44:58 pveprod01 systemd[1]: Mounted boot-efi.mount - /boot/efi. Nov 19 12:44:58 pveprod01 zvol_wait[2495]: No zvols found, nothing to do. Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev... Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-mount.service - Mount ZFS filesystems... Nov 19 12:44:58 pveprod01 systemd[1]: Mounting boot-efi.mount - /boot/efi... Nov 19 12:44:58 pveprod01 systemd[1]: Reached target zfs-import.target - ZFS pool import target. Nov 19 12:44:58 pveprod01 systemd[1]: Failed to start zfs-import@zpha.service - Import ZFS pool zpha. Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zpha.service: Failed with result 'exit-code'. Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zpha.service: Main process exited, code=exited, status=1/FAILURE Nov 19 12:44:58 pveprod01 systemd[1]: Failed to start zfs-import@zp01.service - Import ZFS pool zp01. Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zp01.service: Failed with result 'exit-code'. Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zp01.service: Main process exited, code=exited, status=1/FAILURE Nov 19 12:44:58 pveprod01 zpool[2367]: use the form 'zpool import <pool | id> <newpool>' to give it a new name Nov 19 12:44:58 pveprod01 zpool[2367]: cannot import 'zpha': a pool with that name already exists Nov 19 12:44:58 pveprod01 systemd[1]: Finished zfs-import-cache.service - Import ZFS pools by cache file. Nov 19 12:44:58 pveprod01 zpool[1997]: use the form 'zpool import <pool | id> <newpool>' to give it a new name Nov 19 12:44:58 pveprod01 zpool[1997]: cannot import 'zp01': a pool with that name already exists Nov 19 12:44:58 pveprod01 systemd[1]: Finished systemd-fsck@dev-disk-by\x2dlabel-EFI_SYSPART.service - File System Check on /dev/d> Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-import@zpha.service - Import ZFS pool zpha... Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-import@zp01.service - Import ZFS pool zp01... Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import-scan.service - Import ZFS pools by device scanning was skipped because of an unme> Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...

I proceed disabling the services (systemctl disable zfs-import@zp01.service , ....), but worried about secondary problems thi can give.
Any hints appreciated.
Thanks, P.
 
Last edited: