ZFS Import fails during boot - systemctl reflects this but the pool is actually mounted once fully booted?

jsalas424

Member
Jul 5, 2020
141
2
23
34
I get the error "Failed to start Import ZFS pool new_ssd", but once Proxmox is up everything is available as expected. This is the only ZFS pool that throws errors. The error instructs me to check systemctl with the following command.

Code:
root@TracheServ:~# systemctl status zfs-import@new_ssd.service
● zfs-import@new_ssd.service - Import ZFS pool new_ssd
   Loaded: loaded (/lib/systemd/system/zfs-import@.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sun 2021-02-21 10:58:15 EST; 21min ago
     Docs: man:zpool(8)
  Process: 1303 ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=none new_ssd (code=exited, status=1/F
Main PID: 1303 (code=exited, status=1/FAILURE)

Feb 21 10:58:15 TracheServ systemd[1]: Starting Import ZFS pool new_ssd...
Feb 21 10:58:15 TracheServ zpool[1303]: cannot import 'new_ssd': no such pool available
Feb 21 10:58:15 TracheServ systemd[1]: zfs-import@new_ssd.service: Main process exited, code=exited, status=1/FAILURE
Feb 21 10:58:15 TracheServ systemd[1]: zfs-import@new_ssd.service: Failed with result 'exit-code'.
Feb 21 10:58:15 TracheServ systemd[1]: Failed to start Import ZFS pool new_ssd.

Code:
root@TracheServ:~# zfs list
new_ssd                               242G   656G      209G  /new_ssd
new_ssd/base-6969-disk-0             33.0G   689G       56K  -
new_ssd/new_ssd.VMs                    96K   656G       96K  /new_ssd/new_ssd.VMs

Code:
root@TracheServ:~# zpool status -v new_ssd
  pool: new_ssd
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:08:19 with 0 errors on Sun Feb 14 00:32:24 2021
config:

        NAME                                             STATE     READ WRITE CKSUM
        new_ssd                                          ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            ata-Samsung_SSD_860_EVO_1TB_S5B3NDFNA02148D  ONLINE       0     0     0
            ata-Samsung_SSD_860_EVO_1TB_S5B3NDFN915923H  ONLINE       0     0     0

errors: No known data errors

so what's going on? Should I even care?
 
Last edited:
Hmm - my guess at what's going on:
* the disks are not present yet, when the system reaches the zfs-import@new_ssd.service - or something else prevents them
* the pool gets imported by `pvestatd` (or by starting a guest which has a disk on that storage)

Since we recently added the pool-import service for specific pools just to prevent these situations - I would be curios to find out what's happening - would you mind sharing the system-journal since boot, when this occured? (journalctl -b)

Else if the pool is imported and mounted properly and your guests work fine - I think this is not too worrisome

Thanks!
 
Hmm - my guess at what's going on:
* the disks are not present yet, when the system reaches the zfs-import@new_ssd.service - or something else prevents them
* the pool gets imported by `pvestatd` (or by starting a guest which has a disk on that storage)

Since we recently added the pool-import service for specific pools just to prevent these situations - I would be curios to find out what's happening - would you mind sharing the system-journal since boot, when this occured? (journalctl -b)

Else if the pool is imported and mounted properly and your guests work fine - I think this is not too worrisome

Thanks!

I failed to mention another detail - I had another ZFS issue recur after this latest upgrade. I have some directories mounted at a ZFS share (Nextcloud.Storage) with is_mountpoint 1 declared in the storage. I had worked with Aaron from your team prior when I first set up these directories because they were creating folders at the ZFS share that were preventing it from being mounted (mountpoint isn't empty problem). I removed the directories and was able to mount fine. This issue recurred after the latest upgrade which you'll see reflected in the log.

Here's me working through it with Aaron previously:
https://forum.proxmox.com/threads/w...-much-space-on-local.78967/page-2#post-353569

I also have directories mounted at new_ssd but I didn't have this problem - so something inconsistent is happening here.

Logs: https://pastebin.com/Kxm5z2wn
 
hmm - as far as i can see in the journal it seems that the disks should all be present - the single thing which times out in that respect (that I saw while skimming over the logs):
Code:
Feb 21 10:59:44 TracheServ systemd[1]: dev-disk-by\x2duuid-9cce7a6b\x2d72c2\x2d4b20\x2d81cc\x2deaeaeb4ca8d8.device: Job dev-disk-by\x2duuid-9cce7a6b\x2d72c2\x2d4b20\x2d81cc\x2deaeaeb4ca8d8.device/start timed out.
Feb 21 10:59:44 TracheServ systemd[1]: Timed out waiting for device /dev/disk/by-uuid/9cce7a6b-72c2-4b20-81cc-eaeaeb4ca8d8.

could you please provide:
* the output of ls -l /dev/disk/by-*/
* the output of lsblk
* the output of ls -la /var/lib/smartmontools/
* the contents of /etc/smartd.conf
* the contents of /etc/pve/storage.cfg

the smart-related commands and files are there since I'm curious why only sda is listed in smartd's starting output.

Thanks
 
hmm - as far as i can see in the journal it seems that the disks should all be present - the single thing which times out in that respect (that I saw while skimming over the logs):
Code:
Feb 21 10:59:44 TracheServ systemd[1]: dev-disk-by\x2duuid-9cce7a6b\x2d72c2\x2d4b20\x2d81cc\x2deaeaeb4ca8d8.device: Job dev-disk-by\x2duuid-9cce7a6b\x2d72c2\x2d4b20\x2d81cc\x2deaeaeb4ca8d8.device/start timed out.
Feb 21 10:59:44 TracheServ systemd[1]: Timed out waiting for device /dev/disk/by-uuid/9cce7a6b-72c2-4b20-81cc-eaeaeb4ca8d8.

could you please provide:
* the output of ls -l /dev/disk/by-*/
* the output of lsblk
* the output of ls -la /var/lib/smartmontools/
* the contents of /etc/smartd.conf
* the contents of /etc/pve/storage.cfg

the smart-related commands and files are there since I'm curious why only sda is listed in smartd's starting output.

Thanks
As requested:
https://pastebin.com/hkEX4xnt
https://pastebin.com/Gc0UCBxv
https://pastebin.com/VA1cE0SZ
https://pastebin.com/Bf4uEipa
https://pastebin.com/yhmUtnkN
 
Did you update to zfs
No. I see that as an option now under upgrades but it also shows no previous version installed.

1614118549780.png

root@TracheServ:~# dmesg | grep ZFS
[ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-5.4.78-2-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
[ 0.306949] Kernel command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-5.4.78-2-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
[ 5.072652] ZFS: Loaded module v0.8.5-pve1, ZFS pool version 5000, ZFS filesystem version 5

root@TracheServ:~# cat /sys/module/zfs/version
0.8.5-pve1

root@TracheServ:~# modinfo zfs
filename: /lib/modules/5.4.78-2-pve/zfs/zfs.ko
version: 0.8.5-pve1
license: CDDL
author: OpenZFS on Linux
description: ZFS

Looks like it's version 0.8.5
 
Last edited:
  • Like
Reactions: Taylan
Hmm - my guess at what's going on:
* the disks are not present yet, when the system reaches the zfs-import@new_ssd.service - or something else prevents them
* the pool gets imported by `pvestatd` (or by starting a guest which has a disk on that storage)

Since we recently added the pool-import service for specific pools just to prevent these situations - I would be curios to find out what's happening - would you mind sharing the system-journal since boot, when this occured? (journalctl -b)

Else if the pool is imported and mounted properly and your guests work fine - I think this is not too worrisome

Thanks!
I upgraded today just to see if that would make a difference. New (journalctl -b) log attached!

https://pastebin.com/naJuaBcG
 
hmm - the issue remains...

We have a potentially similar case in another thread:
https://forum.proxmox.com/threads/z...-boot-but-appears-to-be-imported-after.84784/

if possible could you also try to add an override.conf for zfs-import@new_ssd.service, which adds a 'Requires=systemd-udev-settle.service'
and reboot as indicated there?
(then share the log and remove the snippet again)
Thanks!
I went to look for the service file in /etc/systemd/system but it's not in there, can you point me in the right direction?

Code:
jon@TracheServ:~$ ls /etc/systemd/system
ceph.target.wants                       smartd.service
dbus-org.freedesktop.timesync1.service  sockets.target.wants
getty.target.wants                      sshd.service
iscsi.service                           sysinit.target.wants
mnt-pve-spare.mount                     syslog.service
multi-user.target.wants                 timers.target.wants
network-online.target.wants             zed.service
node_exporter.service                   zfs-import.target.wants
prometheus-pve-exporter.service         zfs.target.wants
pve-manager.service                     zfs-volumes.target.wants
remote-fs.target.wants

I also realized that there's a service file for mounting a drive I no longer have (mnt-pve-spare.mount). What's the right way to deal with this?
 
Last edited:
I went to look for the service file in /etc/systemd/system but it's not in there, can you point me in the right direction?
the symlink should be in /etc/systemd/system/zfs-import.target.wants/ - but I suggest to use systemctl edit zfs-import@new_ssd.service
(that way you get a clean override snippet which you can then remove)

I also realized that there's a service file for mounting a drive I no longer have (mnt-pve-spare.mount). What's the right way to deal with this?
this should be fixed by running `systemctl disable mnt-pve-spare.mount`
 
Hello everyone,
New Proxmox user here.
Freshly installed Proxmox today, root on ext4. After installation new zpool created via the Proxmox GUI. Then run into the same issue described in this thread.

I did some digging into the system. To me the issue is that the systemd template service created by Proxmox for the given zpool is competing with systemd standard services like zfs-import-cache.service and zfs-import-scan.service to import the very same pool at boot.
So one of the services succeeds with import thus the pool is present when OS boots up, but the other service(s) fail.

So generally not a big issue since the pool is imported. But not the cleanest setup. I tried to fix it with no success.

1/ if I disable zfs-import-cache.service and zfs-import-scan.service then I start to get error that zfs module is not loaded into kernel early enough and the systemd template service created by Proxmox screams again just with different error. The outcome when system booted is that the pool is imported

2/ if I disable the systemd template service created by Proxmox, then the zpool is imported by zfs-import-cache.service, but there is some error reported on github an year ago not fixed yet. But again the pool gets imported automatically somehow by the time the OS completely boots.

https://github.com/openzfs/zfs/issues/10891
Code:
18:13:30 pve udevadm[427]: systemd-udev-settle.service is deprecated. Please fix zfs-import-cache.service, zfs-import-scan.service not to pull it in.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!