ZFS pool import fails on boot, but appears to be imported after

Cephei · Feb 23, 2021

I have 3 SAS drives connected through a RAID card in JBOD, proxmox can see the drives properly, pool 'sas-backup' is made up of 1 vdev with single SAS drive and pool 'sas-vmdata' is made up of single vdev which in turn is built from 2 mirrored SAS drives. Everything seems to be working fine, however, system fails to import one of 3 zfs pools on boot with the following message:

Code:

root@pve:~# systemctl status zfs-import@sas\\x2dbackup.service
● zfs-import@sas\x2dbackup.service - Import ZFS pool sas\x2dbackup
   Loaded: loaded (/lib/systemd/system/zfs-import@.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2021-02-23 15:32:51 EST; 43min ago
     Docs: man:zpool(8)
  Process: 941 ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=none sas-backup (code=exited, status=1/FAILURE)
 Main PID: 941 (code=exited, status=1/FAILURE)

Feb 23 15:32:51 pve systemd[1]: Starting Import ZFS pool sas\x2dbackup...
Feb 23 15:32:51 pve zpool[941]: cannot import 'sas-backup': no such pool available
Feb 23 15:32:51 pve systemd[1]: zfs-import@sas\x2dbackup.service: Main process exited, code=exited, status=1/FAILURE
Feb 23 15:32:51 pve systemd[1]: zfs-import@sas\x2dbackup.service: Failed with result 'exit-code'.
Feb 23 15:32:51 pve systemd[1]: Failed to start Import ZFS pool sas\x2dbackup.

The rest seem to import fine:

Code:

root@pve:~# systemctl | grep zfs
  zfs-import-cache.service                                                                           loaded active     exited    Import ZFS pools by cache file
● zfs-import@sas\x2dbackup.service                                                                   loaded failed     failed    Import ZFS pool sas\x2dbackup
  zfs-import@sas\x2dvmdata.service                                                                   loaded active     exited    Import ZFS pool sas\x2dvmdata
  zfs-import@sata\x2dstr0.service                                                                    loaded active     exited    Import ZFS pool sata\x2dstr0
  zfs-mount.service                                                                                  loaded active     exited    Mount ZFS filesystems
  zfs-share.service                                                                                  loaded active     exited    ZFS file system shares
  zfs-volume-wait.service                                                                            loaded active     exited    Wait for ZFS Volume (zvol) links in /dev
  zfs-zed.service                                                                                    loaded active     running   ZFS Event Daemon (zed)
  system-zfs\x2dimport.slice                                                                         loaded active     active    system-zfs\x2dimport.slice
  zfs-import.target                                                                                  loaded active     active    ZFS pool import target
  zfs-volumes.target                                                                                 loaded active     active    ZFS volumes are ready
  zfs.target                                                                                         loaded active     active    ZFS startup target

I have tried adding root delay in both grub and zfs configs with no luck. I have reinstalled proxmox a few times and recreated the same set-up with same error. Previous installation would display same error even after the pools were destroyed, so no idea why it was still trying to import. Any help/feedback greatly appreciated!

Running pve with no subscription. Version:

Code:

root@pve:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-5
pve-kernel-helper: 6.3-5
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-1
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve1

Ramalama · Feb 23, 2021

You just runned into same as me.
zpool status
And check for "sas-backup"

Im pretty sure that's a pool you had one day, but deleted it

If i'm right and i am right, do this:
systemctl disable zfs-import@sas\\x2dbackup.service

Can happen that you need to tab to complete that service, cause of that slashes...

Cheers

Cephei · Feb 24, 2021

Thanks for your feedback. I see what you mean, however this is a new install of proxmox, all the drives have been wiped of all partitions and zapped with gdisk. "sas-backup" pool is brand new to this system and there should be no reason to disable the service. All the pools are imported after boot and seem to be working without any issues. The error at boot is still concerning.

zpool status:

Code:

root@pve:~# zpool status
  pool: sas-backup
 state: ONLINE
config:

        NAME                      STATE     READ WRITE CKSUM
        sas-backup                ONLINE       0     0     0
          scsi-35000c5009699d147  ONLINE       0     0     0

errors: No known data errors

  pool: sas-vmdata
 state: ONLINE
config:

        NAME                        STATE     READ WRITE CKSUM
        sas-vmdata                  ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            scsi-35000cca02c2a3824  ONLINE       0     0     0
            scsi-35000cca02c2a2804  ONLINE       0     0     0

errors: No known data errors

  pool: sata-str0
 state: ONLINE
config:

        NAME                                 STATE     READ WRITE CKSUM
        sata-str0                            ONLINE       0     0     0
          mirror-0                           ONLINE       0     0     0
            ata-ST8000VN004-2M2101_WKD3CXWW  ONLINE       0     0     0
            ata-ST8000VN004-2M2101_WKD3CYB4  ONLINE       0     0     0

errors: No known data errors

Ramalama · Feb 24, 2021

Oh wow, that's new then.
Sorry, i had the exact same issue, but i forgot in my case, that i had destroyed/deleted the pool and wondered why that error appears...

But in your case, you have indeed the pool, so i don't know why you get that error...

Wanted to bee cool once, but yeah

Stoiko Ivanov · Feb 24, 2021

Regarding the failed import of a destroyed pool:
@Ramalama is correct there - if you destroy a pool (which was created via the GUI) you should also disable the pool-specific import service.

Regarding the not-imported pool, which still exists and is on the same controller as pools which get imported correctly - that's certainly odd (even though we have one other report about something like this currently)

@Cephei could you please share the journal since boot (`journalctl -b`) and `dmesg`?

Thanks!

Cephei · Feb 24, 2021

Stoiko Ivanov said:
Regarding the failed import of a destroyed pool:
@Ramalama is correct there - if you destroy a pool (which was created via the GUI) you should also disable the pool-specific import service.

Regarding the not-imported pool, which still exists and is on the same controller as pools which get imported correctly - that's certainly odd (even though we have one other report about something like this currently)

@Cephei could you please share the journal since boot (`journalctl -b`) and `dmesg`?

Thanks!

@Stoiko Ivanov Thank you for your response. Just rebooted the system, attached are 'journalctl -b 'and 'dmesg'.

I wasn't aware that you need to disable the pool specific import service if the pool is deleted. I guess it makes sense since the pool is deleted from CLI directly. I have also accidentally stumbled upon the other thread with a similar issue after I have posted mine, apoligies for that, as it does look very similar. I spent a few days googling, but should have searched the forum directly first.

In that thread you also mentioned that you "added the pool-import service for specific pools just to prevent these situations". Is the service that is failing the one you are talking about? Are there multiple mechanisms in place that import pools at boot? If that specific import service failed, then how does the pool end up imported eventually?

Thank you in advance!

Stoiko Ivanov · Feb 24, 2021

Cephei said:
I wasn't aware that you need to disable the pool specific import service if the pool is deleted. I guess it makes sense since the pool is deleted from CLI directly.

yes - currently the GUI only supports adding new storages - since removing them (in the sense of `zpool destroy`) has a quite large potential for accidental dataloss.

Cephei said:
In that thread you also mentioned that you "added the pool-import service for specific pools just to prevent these situations". Is the service that is failing the one you are talking about?

yes exactly zfs-import@.service was introduced to make sure that pools added via GUI are indeed imported during boot (we had quite some reports from systems where the zpool.cache file was empty/corrupted - and that seemed like a way to address that)

Cephei said:
Are there multiple mechanisms in place that import pools at boot? If that specific import service failed, then how does the pool end up imported eventually?

a pool which is defined in /etc/pve/storage.cfg can get activated (in case this did not work during boot) by:
* guests being started on boot (which have a disk on that pool)
* pvestatd (which runs constantly and checks the storages (for which it needs to activate them)

regarding your issue:
* I just realized that zfs-import@.service does not list 'Requires=systemd-udev-settle.service' - which could potentially explain the issue - could you try adding that requirement via an override-file
* see e.g. https://wiki.archlinux.org/index.php/systemd#Drop-in_files for an explanation
* it's documented in https://www.freedesktop.org/software/systemd/man/systemd.unit.html
* run `systemctl daemon-reload; update-initramfs -k all -u` afterwards
* reboot

Thanks!

Cephei · Feb 25, 2021

Stoiko Ivanov said:
yes - currently the GUI only supports adding new storages - since removing them (in the sense of `zpool destroy`) has a quite large potential for accidental dataloss.

yes exactly zfs-import@.service was introduced to make sure that pools added via GUI are indeed imported during boot (we had quite some reports from systems where the zpool.cache file was empty/corrupted - and that seemed like a way to address that)

a pool which is defined in /etc/pve/storage.cfg can get activated (in case this did not work during boot) by:
* guests being started on boot (which have a disk on that pool)
* pvestatd (which runs constantly and checks the storages (for which it needs to activate them)

regarding your issue:
* I just realized that zfs-import@.service does not list 'Requires=systemd-udev-settle.service' - which could potentially explain the issue - could you try adding that requirement via an override-file
* see e.g. https://wiki.archlinux.org/index.php/systemd#Drop-in_files for an explanation
* it's documented in https://www.freedesktop.org/software/systemd/man/systemd.unit.html
* run `systemctl daemon-reload; update-initramfs -k all -u` afterwards
* reboot

Thanks!

Just tried the fix - still experiencing the same issue. Tried adding the drop-in for both zfs-import@.service and my pool specifically zfs-import@sas\\x2dbackup.service. There's already "After=systemd-udev-settle.service" requirement though, so unless systemd-udev-settle.servicefails to start, I don't think adding "Requires=systemd-udev-settle.service" shouldn't make a difference.

According to this, systemd-udev-settle.service doesn't guarantee hardware readiness and instead it is suggested to have services subscribe to udev events.

Even still, I expected that adding a root delay would fix the issue, but that was not the case either.

root@pve:~# systemctl cat zfs-import@.service
# /usr/lib/systemd/system/zfs-import@.service
[Unit]
Description=Import ZFS pool %i
Documentation=man:zpool(8)
DefaultDependencies=no
After=systemd-udev-settle.service
After=cryptsetup.target
After=multipathd.target
Before=zfs-import.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=none %I

[Install]
WantedBy=zfs-import.target

# /etc/systemd/system/zfs-import@.service.d/override.conf
[Unit]
Requires=systemd-udev-settle.service

root@pve:~# systemctl cat zfs-import@sas\\x2dbackup.service
# /lib/systemd/system/zfs-import@.service
[Unit]
Description=Import ZFS pool %i
Documentation=man:zpool(8)
DefaultDependencies=no
After=systemd-udev-settle.service
After=cryptsetup.target
After=multipathd.target
Before=zfs-import.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=none %I

[Install]
WantedBy=zfs-import.target

# /etc/systemd/system/zfs-import@sas\x2dbackup.service.d/override.conf
[Unit]
Requires=systemd-udev-settle.service

Stoiko Ivanov · Feb 25, 2021

Cephei said:
According to this, systemd-udev-settle.service doesn't guarantee hardware readiness and instead it is suggested to have services subscribe to udev events

That's correct - and already discussed upstream:
https://github.com/openzfs/zfs/issues/10891

the problem with that is that this might still cause problems (similar to the ones we're having now):
* you create a pool - and pool-creation, creates the correct systemd.service, which waits for all devices you have in the pool
* you add some device to the pool, but don't update the initramfs -> there is nothing waiting for the new device ...

but - I'm sure there will be a viable solution eventually

Cephei said:
Even still, I expected that adding a root delay would fix the issue, but that was not the case either.

that's also quite odd (and should rule out some disk not being activated fast enough) - how did you set the timeout?

could you post:
* `zfs version`
* try rebooting after running `update-initramfs -k all -u` ?
thanks!

admjral3 · Feb 25, 2021

Code:

root@pve:~# zpool status
  pool: HDD
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 00:56:37 with 0 errors on Sun Feb 14 01:20:38 2021
config:

        NAME                                STATE     READ WRITE CKSUM
        HDD                                 ONLINE       0     0     0
          ata-WDC_WUS721010ALE6L4_VCGLVHUN  ONLINE       0     0     0

errors: No known data errors

  pool: fastvms
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 00:03:16 with 0 errors on Sun Feb 14 00:27:18 2021
config:

        NAME                                        STATE     READ WRITE CKSUM
        fastvms                                     ONLINE       0     0     0
          nvme-WDC_WDS100T2B0C-00PXH0_20207E444202  ONLINE       0     0     0

errors: No known data errors

  pool: fastvms2
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 00:03:31 with 0 errors on Sun Feb 14 00:27:34 2021
config:

        NAME                                STATE     READ WRITE CKSUM
        fastvms2                            ONLINE       0     0     0
          nvme-Lexar_1TB_SSD_K29361R000564  ONLINE       0     0     0

errors: No known data errors

  pool: fastvms3
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 00:06:53 with 0 errors on Sun Feb 14 00:30:57 2021
config:

        NAME                              STATE     READ WRITE CKSUM
        fastvms3                          ONLINE       0     0     0
          nvme-CT1000P1SSD8_2025292369BB  ONLINE       0     0     0

errors: No known data errors

what should i do rightnow?

Stoiko Ivanov · Feb 25, 2021

admjral3 said:
what should i do rightnow?

Not sure I understand you right ...
but I guess that you're asking about the zpool output referring to unsupported features?
if that's the case - this is not related to the issue of this thread:
zfs sometimes introduces new features with a new major version (like now the version changed from 0.8.6 to 2.0.3).
some of the features change your pools in a way that is not compatible with the new version - thus you need to explicitly enable them by running zpool upgrade - in any case check the (referenced) manual page `man zpool-features` for further information.

In case you have ZFS on root (does not look like it from the zpool output) - and boot with grub- I would not enable the features - since grub's zfs implementation does not support all features.

If this does not answer your question - please open a new thread and provide more details

Cephei · Feb 25, 2021

Stoiko Ivanov said:
That's correct - and already discussed upstream:
https://github.com/openzfs/zfs/issues/10891

the problem with that is that this might still cause problems (similar to the ones we're having now):
* you create a pool - and pool-creation, creates the correct systemd.service, which waits for all devices you have in the pool
* you add some device to the pool, but don't update the initramfs -> there is nothing waiting for the new device ...

but - I'm sure there will be a viable solution eventually

that's also quite odd (and should rule out some disk not being activated fast enough) - how did you set the timeout?

could you post:
* `zfs version`
* try rebooting after running `update-initramfs -k all -u` ?
thanks!

The timeout was set by adding ZFS_INITRD_PRE_MOUNTROOT_SLEEP='10' to /etc/default/zfs and I then ran "update-initramfs -k all -u" and reboot - same issue. I have also tried adding GRUB_CMDLINE_LINUX_DEFAULT="rootdelay=10 quiet" in grub config and updating grub - same issue.

Code:

root@pve:~# zfs version
zfs-2.0.3-pve1
zfs-kmod-2.0.3-pve1

When I am adding root delay to /etc/default/zfs or grub, the timing of zfs-import@.service does not seem to be affected by it. zfs-import@.service seems to be fired off right after the disk is attached. So no wonder those solutions produce no result.

Code:

root@pve:~# journalctl | grep zfs
Feb 25 14:13:23 pve systemd[1]: Created slice system-zfs\x2dimport.slice.
Feb 25 14:13:24 pve systemd-modules-load[635]: Inserted module 'zfs'
Feb 25 14:13:24 pve systemd[1]: zfs-import@sas\x2dbackup.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 14:13:24 pve systemd[1]: zfs-import@sas\x2dbackup.service: Failed with result 'exit-code'.
root@pve:~#
root@pve:~# journalctl | grep sda
Feb 25 14:13:23 pve kernel: sd 0:0:19:0: [sda] 3516328368 512-byte logical blocks: (1.80 TB/1.64 TiB)
Feb 25 14:13:23 pve kernel: sd 0:0:19:0: [sda] 4096-byte physical blocks
Feb 25 14:13:23 pve kernel: sd 0:0:19:0: [sda] Write Protect is off
Feb 25 14:13:23 pve kernel: sd 0:0:19:0: [sda] Mode Sense: dd 00 10 08
Feb 25 14:13:23 pve kernel: sd 0:0:19:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
Feb 25 14:13:23 pve kernel:  sda: sda1 sda9
Feb 25 14:13:23 pve kernel: sd 0:0:19:0: [sda] Attached SCSI disk
Feb 25 14:13:28 pve smartd[2502]: Device: /dev/sda, opened
Feb 25 14:13:28 pve smartd[2502]: Device: /dev/sda, [SEAGATE  ST1800MM0018     LE2B], lu id: 0x5000c5009699d147, S/N: S3Z08RA00000K633F64U, 1.80 TB
Feb 25 14:13:28 pve smartd[2502]: Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Feb 25 14:13:28 pve smartd[2502]: Device: /dev/sda, state read from /var/lib/smartmontools/smartd.SEAGATE-ST1800MM0018-S3Z08RA00000K633F64U.scsi.state
Feb 25 14:13:28 pve smartd[2502]: Device: /dev/bus/0, same identity as /dev/sda, ignored
Feb 25 14:13:34 pve smartd[2502]: Device: /dev/sda, state written to /var/lib/smartmontools/smartd.SEAGATE-ST1800MM0018-S3Z08RA00000K633F64U.scsi.state

Cephei · Feb 25, 2021

I have just tried adding a delay to zfs-import@.service (30 sec) and now the service started 30 seconds after the disk was attached. Still failed though. But now because the pool was imported by something else by then - "cannot import 'sas-backup': a pool with that name already exists".
Btw, I have posted in another thread about Error 400 when trying to see zfs pool details in GUI. No one has said anything yet

Code:

root@pve:~# systemctl status zfs-import@sas\\x2dbackup.service
● zfs-import@sas\x2dbackup.service - Import ZFS pool sas\x2dbackup
   Loaded: loaded (/lib/systemd/system/zfs-import@.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/zfs-import@sas\x2dbackup.service.d
           └─override.conf
   Active: failed (Result: exit-code) since Thu 2021-02-25 14:26:14 EST; 8min ago
     Docs: man:zpool(8)
  Process: 944 ExecStartPre=/bin/sleep 30 (code=exited, status=0/SUCCESS)
  Process: 2413 ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=none sas-backup (code=exited, status=1/FAILURE)
Main PID: 2413 (code=exited, status=1/FAILURE)

Feb 25 14:25:44 pve systemd[1]: Starting Import ZFS pool sas\x2dbackup...
Feb 25 14:26:14 pve zpool[2413]: cannot import 'sas-backup': a pool with that name already exists
Feb 25 14:26:14 pve zpool[2413]: use the form 'zpool import <pool | id> <newpool>' to give it a new name
Feb 25 14:26:14 pve systemd[1]: zfs-import@sas\x2dbackup.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 14:26:14 pve systemd[1]: zfs-import@sas\x2dbackup.service: Failed with result 'exit-code'.
Feb 25 14:26:14 pve systemd[1]: Failed to start Import ZFS pool sas\x2dbackup.

root@pve:~# journalctl | grep sda
Feb 25 14:25:44 pve kernel: sd 0:0:19:0: [sda] 3516328368 512-byte logical blocks: (1.80 TB/1.64 TiB)
Feb 25 14:25:44 pve kernel: sd 0:0:19:0: [sda] 4096-byte physical blocks
Feb 25 14:25:44 pve kernel: sd 0:0:19:0: [sda] Write Protect is off
Feb 25 14:25:44 pve kernel: sd 0:0:19:0: [sda] Mode Sense: dd 00 10 08
Feb 25 14:25:44 pve kernel: sd 0:0:19:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
Feb 25 14:25:44 pve kernel:  sda: sda1 sda9
Feb 25 14:25:44 pve kernel: sd 0:0:19:0: [sda] Attached SCSI disk

Stoiko Ivanov · Feb 25, 2021

Cephei said:
The timeout was set by adding ZFS_INITRD_PRE_MOUNTROOT_SLEEP='10' to /etc/default/zfs and I then ran "update-initramfs -k all -u" and reboot - same issue. I have also tried adding GRUB_CMDLINE_LINUX_DEFAULT="rootdelay=10 quiet" in grub config and updating grub - same issue.

hm - from your journal it seems that you have a efi-booted system - maybe it gets booted by systemd-boot instead of grub:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot
(although since you don't have a rpool - I guess that's not the case)

Cephei said:
So no wonder those solutions produce no result.

guess as much - but then am really at a loss what's going on there ...
I'll try to find a way to gather some debug output from zfs...

Cephei said:
Btw, I have posted in another thread about Error 400 when trying to see zfs pool details in GUI. No one has said anything yet

We try to answer the threads in the forum as good as we can - but cannot guarantee that we get to each and every thread in a fast manner (there are far more threads each day than we are here)

Cephei · Feb 25, 2021

Stoiko Ivanov said:
We try to answer the threads in the forum as good as we can - but cannot guarantee that we get to each and every thread in a fast manner (there are far more threads each day than we are here)

Thank you for your hard work!!!

Cephei · Feb 25, 2021

Stoiko Ivanov said:
hm - from your journal it seems that you have a efi-booted system - maybe it gets booted by systemd-boot instead of grub:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot
(although since you don't have a rpool - I guess that's not the case)

Proxmox is installed on a single sata ssd in ext4. I do see the blue grub menu on boot.

Code:

root@pve:~# efibootmgr -v
BootCurrent: 0011
Timeout: 5 seconds
BootOrder: 0011,0000,0005
Boot0000* Hard Drive    BBS(HD,,0x0)P0: INTEL SSDSC2MH120A2       .
Boot0005* Internal EFI Shell    VenMedia(5023b95c-db26-429b-a648-bd47664c8012)/FvFile(c57ad6b7-0515-40a8-9d21-551652854e37)
Boot0011* UEFI OS       HD(2,GPT,aee2536e-af10-4156-9698-458533a09a0c,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)

mouk · Mar 9, 2021

We are seeing the same with our PoC PBS installation. PBS installed on a simple supermicro DOM, with 8 storage disks in a zfs raidz2 config.

During boot, the import fails:

Code:

root@pbs:~# systemctl status zfs-import@storage\\x2dbackup.service
● zfs-import@storage\x2dbackup.service - Import ZFS pool storage\x2dbackup
   Loaded: loaded (/lib/systemd/system/zfs-import@.service; enabled; vendor pres
   Active: failed (Result: exit-code) since Tue 2021-03-09 11:09:04 CET; 17min a
     Docs: man:zpool(8)
  Process: 821 ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=n
Main PID: 821 (code=exited, status=1/FAILURE)

Mar 09 11:09:04 pbs systemd[1]: Starting Import ZFS pool storage\x2dbackup...
Mar 09 11:09:04 pbs zpool[821]: cannot import 'storage-backup': no such pool avail
Mar 09 11:09:04 pbs systemd[1]: zfs-import@storage\x2dbackup.service: Main process
Mar 09 11:09:04 pbs systemd[1]: zfs-import@storage\x2dbackup.service: Failed with
Mar 09 11:09:04 pbs systemd[1]: Failed to start Import ZFS pool storage\x2dbackup.
lines 1-12/12 (END)...skipping...
● zfs-import@storage\x2dbackup.service - Import ZFS pool storage\x2dbackup
   Loaded: loaded (/lib/systemd/system/zfs-import@.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2021-03-09 11:09:04 CET; 17min ago
     Docs: man:zpool(8)
  Process: 821 ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=none storage-backup (code=exited, status=1/FAILURE)
Main PID: 821 (code=exited, status=1/FAILURE)

Mar 09 11:09:04 pbs systemd[1]: Starting Import ZFS pool storage\x2dbackup...
Mar 09 11:09:04 pbs zpool[821]: cannot import 'storage-backup': no such pool available
Mar 09 11:09:04 pbs systemd[1]: zfs-import@storage\x2dbackup.service: Main process exited, code=exited, status=1/FAILURE
Mar 09 11:09:04 pbs systemd[1]: zfs-import@storage\x2dbackup.service: Failed with result 'exit-code'.
Mar 09 11:09:04 pbs systemd[1]: Failed to start Import ZFS pool storage\x2dbackup.

but after boot, the zpool is available:

Code:

root@pbs:~# zpool status
  pool: storage-backup
state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Tue Mar  9 11:11:19 2021
        4.17T scanned at 3.94G/s, 255G issued at 241M/s, 4.73T total
        0B repaired, 5.27% done, 05:24:39 to go
config:

        NAME          STATE     READ WRITE CKSUM
        storage-backup  ONLINE       0     0     0
          raidz2-0    ONLINE       0     0     0
            sdf       ONLINE       0     0     0
            sdg       ONLINE       0     0     0
            sdh       ONLINE       0     0     0
            sdi       ONLINE       0     0     0
            sdb       ONLINE       0     0     0
            sdc       ONLINE       0     0     0
        spares
          sdd         AVAIL
          sde         AVAIL

errors: 1 data errors, use '-v' for a list
root@pbs:~#

Yes.. i know about the data corruption warning

The requested outputs:

Code:

root@pbs:~# zfs version
zfs-2.0.3-pve2
zfs-kmod-2.0.3-pve2
root@pbs:~# update-initramfs -k all -u
update-initramfs: Generating /boot/initrd.img-5.4.101-1-pve
I: The initramfs will attempt to resume from /dev/dm-0
I: (/dev/mapper/pbs-swap)
I: Set the RESUME variable to override this.
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.4.98-1-pve
I: The initramfs will attempt to resume from /dev/dm-0
I: (/dev/mapper/pbs-swap)
I: Set the RESUME variable to override this.
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.4.78-2-pve
I: The initramfs will attempt to resume from /dev/dm-0
I: (/dev/mapper/pbs-swap)
I: Set the RESUME variable to override this.
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.4.65-1-pve
I: The initramfs will attempt to resume from /dev/dm-0
I: (/dev/mapper/pbs-swap)
I: Set the RESUME variable to override this.
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.
root@pbs:~# uname -a
Linux pbs 5.4.101-1-pve #1 SMP PVE 5.4.101-1 (Fri, 26 Feb 2021 13:13:09 +0100) x86_64 GNU/Linux
root@pbs:~#

And for the record: After running update-initramfs -k all -u, and rebooting, the problem persits.

RudyBzh · Apr 7, 2021

Hi,
Having the exact same issue that this thread, and the one here : ZFS Import fails during boot
I was wondering if anybody successfully deal with it ?

$ sudo zpool status
pool: tank
state: ONLINE
scan: scrub repaired 0B in 09:31:03 with 0 errors on Sat Mar 27 04:04:47 2021
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-ST14000NM001G-2KJ103_ZLW25LT7 ONLINE 0 0 0
ata-TOSHIBA_MG07ACA14TE_Z070A2V2F94G ONLINE 0 0 0
ata-ST14000VN0008-2JG101_ZHZ7ENR0 ONLINE 0 0 0

errors: No known data errors

$ sudo systemctl | grep zfs
zfs-import-cache.service loaded active exited Import ZFS pools by cache file
● zfs-import@tank.service loaded failed failed Import ZFS pool tank
zfs-mount.service loaded active exited Mount ZFS filesystems
zfs-share.service loaded active exited ZFS file system shares
zfs-volume-wait.service loaded active exited Wait for ZFS Volume (zvol) links in /dev
zfs-zed.service loaded active running ZFS Event Daemon (zed)
system-zfs\x2dimport.slice loaded active active system-zfs\x2dimport.slice
zfs-import.target loaded active active ZFS pool import target
zfs-volumes.target loaded active active ZFS volumes are ready
zfs.target loaded active active ZFS startup target

$ sudo systemctl status zfs-import@tank.service
● zfs-import@tank.service - Import ZFS pool tank
Loaded: loaded (/lib/systemd/system/zfs-import@.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2021-04-07 09:05:23 CEST; 52min ago
Docs: man:zpool(8)
Main PID: 875 (code=exited, status=1/FAILURE)

Apr 07 09:05:23 pve systemd[1]: Starting Import ZFS pool tank...
Apr 07 09:05:23 pve zpool[875]: cannot import 'tank': no such pool available
Apr 07 09:05:23 pve systemd[1]: zfs-import@tank.service: Main process exited, code=exited, status=1/FAILURE
Apr 07 09:05:23 pve systemd[1]: zfs-import@tank.service: Failed with result 'exit-code'.
Apr 07 09:05:23 pve systemd[1]: Failed to start Import ZFS pool tank.

$ sudo systemctl status zfs-import-cache.service
● zfs-import-cache.service - Import ZFS pools by cache file
Loaded: loaded (/lib/systemd/system/zfs-import-cache.service; enabled; vendor preset: enabled)
Active: active (exited) since Wed 2021-04-07 09:05:36 CEST; 53min ago
Docs: man:zpool(8)
Main PID: 874 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 4915)
Memory: 0B
CGroup: /system.slice/zfs-import-cache.service

Apr 07 09:05:23 pve systemd[1]: Starting Import ZFS pools by cache file...
Apr 07 09:05:36 pve systemd[1]: Started Import ZFS pools by cache file.

$ sudo pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.13-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-5
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-10
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

I tried "update-initramfs -k all -u", without any success, like others.
I did not change the service to add "Required systemd-udev-settle.service", as it's already as an "After" requirement and it seems to make no difference for others :

$ sudo systemctl cat zfs-import@tank.service
# /lib/systemd/system/zfs-import@.service
[Unit]
Description=Import ZFS pool %i
Documentation=man:zpool(8)
DefaultDependencies=no
After=systemd-udev-settle.service
After=cryptsetup.target
After=multipathd.target
Before=zfs-import.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=none %I

[Install]
WantedBy=zfs-import.target

I'd like to notice that the pool is correctly displayed in host/Disk/ZFS GUI, but is NOT setup as a storage in proxmox (I do NOT add it in storage.cfg because I have no need for it).
Could it make any difference ? Perhaps others are in the same case ?!
BTW, my pool is correctly mounted at boot (But I don't know exactly by who... Nor pvestatd as I don't have it as a storage, nor a VM startup as it not has guests in it...).

Do I miss some solutions or this issue is still opened ?

Thanks.

cholzer · Nov 5, 2021

I can confirm the issue.
During boot - before the login screen shows up - I also get an error that my ssds zpool could not get imported.

Code:

Nov 05 19:23:21 proxmox systemd[1]: Starting Import ZFS pools by cache file...
Nov 05 19:23:21 proxmox systemd[1]: Condition check resulted in Import ZFS pools by device scanning being skipped.
Nov 05 19:23:21 proxmox systemd[1]: Starting Import ZFS pool ssds...
Nov 05 19:23:21 proxmox systemd[1]: Finished Helper to synchronize boot up for ifupdown.
Nov 05 19:23:21 proxmox zpool[805]: cannot import 'ssds': no such pool available
Nov 05 19:23:21 proxmox systemd[1]: zfs-import@ssds.service: Main process exited, code=exited, status=1/FAILURE
Nov 05 19:23:21 proxmox systemd[1]: zfs-import@ssds.service: Failed with result 'exit-code'.
Nov 05 19:23:21 proxmox systemd[1]: Failed to start Import ZFS pool ssds.

Yet when I check the webgui it is clearly present and healthy!?

I tried the suggestions here - they don't elliminate the 'issue'.

oester · Nov 17, 2021

When I'm booting I see two messages:

Failed to start Import ZFS pool vmstore2
Failed to start Import ZFS pool vmstore1

I used to have a pool vmstore2 but removed it. I probably deleted and re-created vmstore1 and it's working just fine. Look at zpool status after reboot, I see one pool, vmstore1 and it's on-line:

Code:

root@pve01:~# zpool status
  pool: vmstore1
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
    The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:03:41 with 0 errors on Sun Nov 14 00:27:42 2021
config:


    NAME                                                 STATE     READ WRITE CKSUM
    vmstore1                                             ONLINE       0     0     0
      nvme-Samsung_SSD_970_EVO_Plus_1TB_S59ANS0NA38267Y  ONLINE       0     0     0


errors: No known data errors

Looking at the service status for the imports, it's failing on vmstore1,2, but for different reasons:

Code:

root@pve01:~# systemctl status zfs-import@vmstore1.service
● zfs-import@vmstore1.service - Import ZFS pool vmstore1
     Loaded: loaded (/lib/systemd/system/zfs-import@.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2021-11-17 06:50:47 CST; 6h ago
       Docs: man:zpool(8)
    Process: 794 ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=none vmstore1 (code=exited, status=1/FAILURE)
   Main PID: 794 (code=exited, status=1/FAILURE)
        CPU: 24ms


Nov 17 06:50:46 pve01 systemd[1]: Starting Import ZFS pool vmstore1...
Nov 17 06:50:47 pve01 zpool[794]: cannot import 'vmstore1': pool already exists
Nov 17 06:50:47 pve01 systemd[1]: zfs-import@vmstore1.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 06:50:47 pve01 systemd[1]: zfs-import@vmstore1.service: Failed with result 'exit-code'.
Nov 17 06:50:47 pve01 systemd[1]: Failed to start Import ZFS pool vmstore1.

root@pve01:~# systemctl status zfs-import@vmstore2.service
● zfs-import@vmstore2.service - Import ZFS pool vmstore2
     Loaded: loaded (/lib/systemd/system/zfs-import@.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2021-11-17 06:50:46 CST; 6h ago
       Docs: man:zpool(8)
    Process: 795 ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=none vmstore2 (code=exited, status=1/FAILURE)
   Main PID: 795 (code=exited, status=1/FAILURE)
        CPU: 21ms


Nov 17 06:50:46 pve01 systemd[1]: Starting Import ZFS pool vmstore2...
Nov 17 06:50:46 pve01 zpool[795]: cannot import 'vmstore2': no such pool available
Nov 17 06:50:46 pve01 systemd[1]: zfs-import@vmstore2.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 06:50:46 pve01 systemd[1]: zfs-import@vmstore2.service: Failed with result 'exit-code'.
Nov 17 06:50:46 pve01 systemd[1]: Failed to start Import ZFS pool vmstore2.

Wondering how I can clean this up?

ZFS pool import fails on boot, but appears to be imported after

New Member

Renowned Member

New Member

Renowned Member

Proxmox Staff Member

New Member

Attachments

Proxmox Staff Member

New Member

Proxmox Staff Member

Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

New Member

New Member

Attachments

Renowned Member

Member

Member

Member

We value your privacy