failed Import ZFS pool

PaulVM · Nov 16, 2023

Playing with a test cluster.
After disconnecting one node to test HA migration (VMs replication active), I have some VMs corrupted (hangs on boot like the disks are corrupted).

From systemctl:
● zfs-import@zp01.service loaded failed failed Import ZFS pool zp01
● zfs-import@zpha.service loaded failed failed Import ZFS pool zpha

Code:

# zpool status -v
  pool: zp01
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        zp01        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            vdd     ONLINE       0     0     0
            vde     ONLINE       0     0     0

errors: No known data errors

  pool: zpha
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        zpha        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            vdb     ONLINE       0     0     0
            vdc     ONLINE       0     0     0

errors: No known data errors

How can I diagnose the problem cause and fix it?

Thanks, P.

LnxBil · Nov 16, 2023

PaulVM said:
After disconnecting one node to test HA migration (VMs replication active)

That is no HA migration, it's failover. A migration normally has no data loss, failover usually has.

Yet this should not corrupt the bootdisk. Does it work if you migrate the VM?

PaulVM · Nov 16, 2023

I am able to migrate a couple of VMs, but if I try a CT, either running or switched off, I have:

trying to acquire lock...
TASK ERROR: can't lock file '/var/lock/pve-manager/pve-migrate-41200' - got timeout

I simply activated replication of the VMs/CTs to the other nodes and added the VMs/CTs to the HA resouces.
In my first bunch of tests, after some replication problems, solved rebooting all nodes, the cluster was working as expected and when I disconnected a node, it rebooted and fenced it and migrate (restarted), the VMs/CTs into other nodes
Then, I recreated the cluster, the ZFS pools and restored VMs/CTs. This time I have the reported problems. All nodes are in the same situation (failed Import ZFS pools).
I can simply restart with a new cluster, but I am interested to learn how to solve this kind of problems to be ready if it happens in a real cluster.
Any hints appreciated.

Thanks, P.

PaulVM · Nov 17, 2023

Reinstalled the cluster and recreated zpools.
Same problems.
I have read other threads that reports the same "problem", but anyone got a real answer.
I tried to simply disable the services with: systemctl disable zfs-import@zfs1.service
like suggested in
https://forum.proxmox.com/threads/zfs-pool-failed-to-import-on-boot-but-showing-up-ok.133840/
and seems all working, but I can't understand if it can be a real problem

PaulVM · Nov 19, 2023

Same results in production servers with last updates:


# pveversion
pve-manager/8.0.9/fd1a0ae1b385cdcd (running kernel: 6.2.16-19-pve)

If can be useful, form journalctl -r I have:


Nov 19 12:44:58 pveprod01 systemd[1]: Reached target zfs-volumes.target - ZFS volumes are ready.
Nov 19 12:44:58 pveprod01 systemd[1]: Reached target local-fs.target - Local File Systems.
Nov 19 12:44:58 pveprod01 systemd[1]: Finished zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev.
Nov 19 12:44:58 pveprod01 systemd[1]: Finished zfs-mount.service - Mount ZFS filesystems.
Nov 19 12:44:58 pveprod01 systemd[1]: Mounted boot-efi.mount - /boot/efi.
Nov 19 12:44:58 pveprod01 zvol_wait[2495]: No zvols found, nothing to do.
Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev...
Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-mount.service - Mount ZFS filesystems...
Nov 19 12:44:58 pveprod01 systemd[1]: Mounting boot-efi.mount - /boot/efi...
Nov 19 12:44:58 pveprod01 systemd[1]: Reached target zfs-import.target - ZFS pool import target.
Nov 19 12:44:58 pveprod01 systemd[1]: Failed to start zfs-import@zpha.service - Import ZFS pool zpha.
Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zpha.service: Failed with result 'exit-code'.
Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zpha.service: Main process exited, code=exited, status=1/FAILURE
Nov 19 12:44:58 pveprod01 systemd[1]: Failed to start zfs-import@zp01.service - Import ZFS pool zp01.
Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zp01.service: Failed with result 'exit-code'.
Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zp01.service: Main process exited, code=exited, status=1/FAILURE
Nov 19 12:44:58 pveprod01 zpool[2367]: use the form 'zpool import <pool | id> <newpool>' to give it a new name
Nov 19 12:44:58 pveprod01 zpool[2367]: cannot import 'zpha': a pool with that name already exists
Nov 19 12:44:58 pveprod01 systemd[1]: Finished zfs-import-cache.service - Import ZFS pools by cache file.
Nov 19 12:44:58 pveprod01 zpool[1997]: use the form 'zpool import <pool | id> <newpool>' to give it a new name
Nov 19 12:44:58 pveprod01 zpool[1997]: cannot import 'zp01': a pool with that name already exists
Nov 19 12:44:58 pveprod01 systemd[1]: Finished systemd-fsck@dev-disk-by\x2dlabel-EFI_SYSPART.service - File System Check on /dev/d>
Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-import@zpha.service - Import ZFS pool zpha...
Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-import@zp01.service - Import ZFS pool zp01...
Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import-scan.service - Import ZFS pools by device scanning was skipped because of an unme>
Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...

I proceed disabling the services (systemctl disable zfs-import@zp01.service , ....), but worried about secondary problems thi can give.
Any hints appreciated.
Thanks, P.

Search

Search

failed Import ZFS pool

PaulVM

Renowned Member

LnxBil

Distinguished Member

PaulVM

Renowned Member

PaulVM

Renowned Member

PaulVM

Renowned Member

We value your privacy