[SOLVED] Second ZFS pool failed to import on boot

mstefan

Member
Jan 3, 2022
24
8
8
33
Hello,

i have 2 zfs pools on my machine (PVE 7.1):
  • rpool (Mirror SSD)
  • datapool (Mirror HDD)
Everytime i boot up my machine, i get an error, the import of datapool failed.
A view in the syslog shows everytime the same enty:
Code:
Jan  3 13:31:29 pve systemd[1]: Starting Import ZFS pool datapool...
Jan  3 13:31:29 pve zpool[1642]: cannot import 'datapool': no such pool available
Jan  3 13:31:29 pve systemd[1]: zfs-import@datapool.service: Main process exited, code=exited, status=1/FAILURE
Jan  3 13:31:29 pve systemd[1]: zfs-import@datapool.service: Failed with result 'exit-code'.
Jan  3 13:31:29 pve systemd[1]: Failed to start Import ZFS pool datapool.
Jan  3 13:31:29 pve zed: eid=7 class=config_sync pool='datapool'
Jan  3 13:31:29 pve zed: eid=8 class=pool_import pool='datapool'
Jan  3 13:31:29 pve zed: eid=10 class=config_sync pool='datapool'

I already tried:
  • adding rootdelay=10 to /etc/defaults/grub
  • adding rootdelay=10 to /etc/kernel/cmdline
  • adding ZFS_INITRD_PRE_MOUNTROOT_SLEEP='5' and ZFS_INITRD_POST_MODPROBE_SLEEP='5' to /etc/defaults/zfs
and refreshed with pve-efiboot-tool refresh.
None of these helped.

When PVE is up and running, the pool is online and everything ist fine.

Is this something i have to worry about?
How can i fix this error message?

Thanks for your help!
 
hi,

can you check the following:
Code:
systemctl status zfs-import.service zfs-import-cache.service

if the zfs-import-cache service is enabled, then maybe your zfs cachefile is corrupt. you could try:
Code:
zpool set cachefile=/etc/zfs/zpool.cache <YOURPOOL> # do this for all your pools
update-initramfs -k all -u
reboot

or alternatively you can use the zfs-import service (can cause slower boot time since it needs to scan the drives without cache) instead of zfs-import-cache: systemctl disable zfs-import-cache.service && systemctl enable zfs-import.service followed by a reboot should hopefully fix the issue.
 
Thanks for your reply.

Here is the output:
Code:
systemctl status zfs-import.service zfs-import-cache.service
● zfs-import.service
     Loaded: masked (Reason: Unit zfs-import.service is masked.)
     Active: inactive (dead)

● zfs-import-cache.service - Import ZFS pools by cache file
     Loaded: loaded (/lib/systemd/system/zfs-import-cache.service; enabled; vendor preset: enabled)
     Active: active (exited) since Mon 2022-01-03 13:31:29 CET; 1h 26min ago
       Docs: man:zpool(8)
   Main PID: 1641 (code=exited, status=0/SUCCESS)
      Tasks: 0 (limit: 76969)
     Memory: 0B
        CPU: 0
     CGroup: /system.slice/zfs-import-cache.service

Jan 03 13:31:28 pve systemd[1]: Starting Import ZFS pools by cache file...
Jan 03 13:31:29 pve systemd[1]: Finished Import ZFS pools by cache file.

Setting the cachefile and updating initramsfs didnt change anything.
The exact error on the bootscreen is:
[FAILED] Failed to start Import ZFS pool datapool.

Disabling cache and activating import throws an error:
Code:
systemctl disable zfs-import-cache.service && systemctl enable zfs-import.service
Removed /etc/systemd/system/zfs-import.target.wants/zfs-import-cache.service.
Failed to enable unit: Unit file /lib/systemd/system/zfs-import.service is masked.
 
Failed to enable unit: Unit file /lib/systemd/system/zfs-import.service is masked.
ah sorry. you can try like: systemctl enable zfs-import@POOLNAME and it should be enabled.

if after reboot you still get the same error please post the output from journalctl -b0 | grep -i zfs -C 2
 
Last edited:
I enabled zfs-import for my datapool and deactivated cache for the pool with zpool set cachefile=none datapool.
Now after a reboot no error occurs.

But i really do not understand what the problem is/was.
This solution just changes the behavior to search for pools (in my understanding)

I will go on trying. Maybe a cachefile per pool makes a difference?

Thanks for your help!

// Edit:
Just for all those who find this thread, I just re-disabled the import for my datapool with systemctl disable zfs-import@POOLNAME and changed the cachefile of this pool to a seperate one /etc/zfs/zpool2.cache with zpool set cachefile=/etc/zfs/zpool2.cache <YOURPOOL>.

No error after reboot.

I will have a closer look in the future if this error will come back again.
 
Last edited:
Now after a reboot no error occurs.
great!! also thanks for trying the alternative solution as well :)

please mark the thread [SOLVED] if the error doesn't occur anymore, so others can know what to expect ;)
 
Have the same problem with a second pool named BIG for HDD pool
But have some issues with errors on the pool status
Sorry i don't have zpool status before the manip.
I do the same manipulation as you "set a second cache for this pool" but steal have errors on the pool for one disk (all the disk are new 200hours)
PVE1_BIG ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-WDC_WD40EFZX-68AWUN0_WD-WX82DA112DD4 ONLINE 1 1 0
ata-WDC_WD40EFZX-68AWUN0_WD-WX12DC128V96 ONLINE 0 0 0
ata-WDC_WD40EFZX-68AWUN0_WD-WX52DB45EVT4 ONLINE 0 0 0
Maybe the first disk id defective but does anybody knows what is the consequences of using the same cache for 2 different pools ?
 
Well it seems the set cachefile doesn't resist to a reboot
Code:
PVE1_BIG  cachefile                      none                           local
The service is already inactive
Code:
systemctl status zfs-import@PVE1_BIG

zfs-import@PVE1_BIG.service - Import ZFS pool PVE1_BIG

     Loaded: loaded (/lib/systemd/system/zfs-import@.service; disabled; vendor preset: enabled)

     Active: inactive (dead)

       Docs: man:zpool(8)
 
Hello there,

i would like to submit my solution to the error message while booting: "[FAILED] Failed to start Import ZFS pool YOURRAID". I found out that there was a symlink from /etc/systemd/system/zfs-import.target.wants/zfs-import@YOURRAID.service to /lib/systemd/system/zfs-import@.service and I think that this causes the notification on startup at least at my system.
Step by Step Solution:
1. Navigate to the folder cd /etc/systemd/system/zfs-import.target.wants/
2. Search for your file with ls
3. Take your file and look up the symlink ls -l zfs-import@YOURRAID.service
4. Unlink the symlink unlink zfs-import@YOURRAID.service
5. reboot

Since then I have no error messages during reboot.
Helping website https://markontech.com/linux/create-symlinks-in-linux/
 
Last edited:
  • Like
Reactions: fabio@80
it seems we got the same issue on proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
could someone clarify if there is simply one cache not suitable for more than one zfs storage the issue or some other issue at work?
 
Same error here. Fresh Proxmox install 7.3, but the error doesn't seem to cause any issues.
 
Ciao,

vorrei inviare la mia soluzione al messaggio di errore durante l'avvio: "[ FAILED ] Impossibile iniziare Import ZFS pool YOURRAID". Ho scoperto che c'era un collegamento simbolico da /etc/systemd/system/zfs-import.target.wants/ zfs-import@YOURRAID.service a /lib/systemd/system/zfs-import@.service e penso che questo provoca la notifica all'avvio almeno sul mio sistema.
Soluzione passo dopo passo:
1. Passare alla cartella cd /etc/systemd/system/zfs-import.target.wants/
2. Cerca il tuo file con ls
3. Prendi il tuo file e cerca il collegamento simbolico ls -l zfs-import@YOURRAID.service
4. Scollega il collegamento simbolico unlink zfs-import@YOURRAID.service
5. riavvia

Da allora non ho messaggi di errore durante il riavvio.
Sito Web di aiuto https://markontech.com/linux/create-symlinks-in-linux/
Grazie mille, ho risolto questo problema grazie al tuo post. Tutto funziona perfettamente
 
Grazie mille, ho risolto questo problema grazie al tuo post. Tutto funziona perfettamente.
 

Attachments

  • zfs.jpg
    zfs.jpg
    337.1 KB · Views: 129
it seems we got the same issue on proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
could someone clarify if there is simply one cache not suitable for more than one zfs storage the issue or some other issue at work?
I also think it has something to do with the none used cached. Maybe it is necessary to give the zfs raid a cache.
 
Just for all those who find this thread, I just re-disabled the import for my datapool with systemctl disable zfs-import@POOLNAME and changed the cachefile of this pool to a seperate one /etc/zfs/zpool2.cache with zpool set cachefile=/etc/zfs/zpool2.cache <YOURPOOL>.

No error after reboot.

I will have a closer look in the future if this error will come back again.
Same here on a fresh 7.4 install. The commands solved the isse.
 
  • Like
Reactions: hk@
Hi all,

just had the same problem and went the path that @HomemadeAdvanced had gone.

I was moving a ZFS-Mirror to another installation consisting of one zpool.

I didn't delete or unlink the the file as described but had a look inside the linked service description. It contained something like
Code:
ExecStart=/sbin/zpool import -aN -d /dev/disk/by-id -o cachefile=none $ZPOOL_IMPORT_OPTS

When I executed
Code:
/sbin/zpool import -aN
i got the error, it wasn't able to import the zpool.

A simple
Code:
/sbin/zpool import -faN (additional -f)
imported the zpool. That solved my problem.
The aforementioned solution may have triggered the same behaviour.
I hope it is helpful for someone.

All the best and thanks to all.
 
Last edited:
With me it appeared to be a time-out issue with my brand new NVME SSD and it's brand new 'Maxiotek 1602' Controller icw installation of Proxmox 8.0 and the linux 6.1 kernel. Sometimes it booted, sometimes not, no matter what delay I put where. Then I found the disk was not available at all in /dev during failures.

https://www.linux.org/threads/lexar-nm790-nvme-fails-to-initialize.46315/
https://lore.kernel.org/lkml/7cd693dd-a6d7-4aab-aef0-76a8366ceee6@archlinux.org/T/
https://www.reddit.com/r/archlinux/comments/15xbxeo/nvme_device_not_ready_aborting_initialisation/

It was introduced in a 6.1.x kernel (read it somewhere but could not find the exact version while writing this message). And hopefully it is fixed in the 6.5 version.
For now reverted back to Proxmox 7.4 with kernel 5.15 and will wait for a version with fixed kernel.


IMG_20230922_161517306-small.jpeg
 
Last edited:
Hi,
With me it appeared to be a time-out issue with my brand new NVME SSD and it's brand new 'Maxiotek 1602' Controller icw installation of Proxmox 8.0 and the linux 6.1 kernel. Sometimes it booted, sometimes not, no matter what delay I put where. Then I found the disk was not available at all in /dev during failures.

https://www.linux.org/threads/lexar-nm790-nvme-fails-to-initialize.46315/
https://lore.kernel.org/lkml/7cd693dd-a6d7-4aab-aef0-76a8366ceee6@archlinux.org/T/
https://www.reddit.com/r/archlinux/comments/15xbxeo/nvme_device_not_ready_aborting_initialisation/

It was introduced in a 6.1.x kernel (read it somewhere but could not find the exact version while writing this message). And hopefully it is fixed in the 6.5 version.
For now reverted back to Proxmox 7.4 with kernel 5.15 and will wait for a version with fixed kernel.


View attachment 55782
might be the same issue as mentioned here: https://forum.proxmox.com/threads/128738/post-588785
Do you also see the Device not ready; aborting initialisation, CSTS=0x0 message?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!