failed Import ZFS pool

PaulVM

Renowned Member
May 24, 2011
102
3
83
Playing with a test cluster.
After disconnecting one node to test HA migration (VMs replication active), I have some VMs corrupted (hangs on boot like the disks are corrupted).


From systemctl:
zfs-import@zp01.service loaded failed failed Import ZFS pool zp01
zfs-import@zpha.service loaded failed failed Import ZFS pool zpha

Code:
# zpool status -v
  pool: zp01
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        zp01        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            vdd     ONLINE       0     0     0
            vde     ONLINE       0     0     0

errors: No known data errors

  pool: zpha
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        zpha        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            vdb     ONLINE       0     0     0
            vdc     ONLINE       0     0     0

errors: No known data errors

How can I diagnose the problem cause and fix it?

Thanks, P.
 
Last edited:
After disconnecting one node to test HA migration (VMs replication active)
That is no HA migration, it's failover. A migration normally has no data loss, failover usually has.

Yet this should not corrupt the bootdisk. Does it work if you migrate the VM?
 
I am able to migrate a couple of VMs, but if I try a CT, either running or switched off, I have:

trying to acquire lock... TASK ERROR: can't lock file '/var/lock/pve-manager/pve-migrate-41200' - got timeout

I simply activated replication of the VMs/CTs to the other nodes and added the VMs/CTs to the HA resouces.
In my first bunch of tests, after some replication problems, solved rebooting all nodes, the cluster was working as expected and when I disconnected a node, it rebooted and fenced it and migrate (restarted), the VMs/CTs into other nodes
Then, I recreated the cluster, the ZFS pools and restored VMs/CTs. This time I have the reported problems. All nodes are in the same situation (failed Import ZFS pools).
I can simply restart with a new cluster, but I am interested to learn how to solve this kind of problems to be ready if it happens in a real cluster.
Any hints appreciated.

Thanks, P.
 
Last edited:
Same results in production servers with last updates:
# pveversion pve-manager/8.0.9/fd1a0ae1b385cdcd (running kernel: 6.2.16-19-pve)

If can be useful, form journalctl -r I have:
Nov 19 12:44:58 pveprod01 systemd[1]: Reached target zfs-volumes.target - ZFS volumes are ready. Nov 19 12:44:58 pveprod01 systemd[1]: Reached target local-fs.target - Local File Systems. Nov 19 12:44:58 pveprod01 systemd[1]: Finished zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev. Nov 19 12:44:58 pveprod01 systemd[1]: Finished zfs-mount.service - Mount ZFS filesystems. Nov 19 12:44:58 pveprod01 systemd[1]: Mounted boot-efi.mount - /boot/efi. Nov 19 12:44:58 pveprod01 zvol_wait[2495]: No zvols found, nothing to do. Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev... Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-mount.service - Mount ZFS filesystems... Nov 19 12:44:58 pveprod01 systemd[1]: Mounting boot-efi.mount - /boot/efi... Nov 19 12:44:58 pveprod01 systemd[1]: Reached target zfs-import.target - ZFS pool import target. Nov 19 12:44:58 pveprod01 systemd[1]: Failed to start zfs-import@zpha.service - Import ZFS pool zpha. Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zpha.service: Failed with result 'exit-code'. Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zpha.service: Main process exited, code=exited, status=1/FAILURE Nov 19 12:44:58 pveprod01 systemd[1]: Failed to start zfs-import@zp01.service - Import ZFS pool zp01. Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zp01.service: Failed with result 'exit-code'. Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import@zp01.service: Main process exited, code=exited, status=1/FAILURE Nov 19 12:44:58 pveprod01 zpool[2367]: use the form 'zpool import <pool | id> <newpool>' to give it a new name Nov 19 12:44:58 pveprod01 zpool[2367]: cannot import 'zpha': a pool with that name already exists Nov 19 12:44:58 pveprod01 systemd[1]: Finished zfs-import-cache.service - Import ZFS pools by cache file. Nov 19 12:44:58 pveprod01 zpool[1997]: use the form 'zpool import <pool | id> <newpool>' to give it a new name Nov 19 12:44:58 pveprod01 zpool[1997]: cannot import 'zp01': a pool with that name already exists Nov 19 12:44:58 pveprod01 systemd[1]: Finished systemd-fsck@dev-disk-by\x2dlabel-EFI_SYSPART.service - File System Check on /dev/d> Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-import@zpha.service - Import ZFS pool zpha... Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-import@zp01.service - Import ZFS pool zp01... Nov 19 12:44:58 pveprod01 systemd[1]: zfs-import-scan.service - Import ZFS pools by device scanning was skipped because of an unme> Nov 19 12:44:58 pveprod01 systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...

I proceed disabling the services (systemctl disable zfs-import@zp01.service , ....), but worried about secondary problems thi can give.
Any hints appreciated.
Thanks, P.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!