[SOLVED] Second ZFS pool failed to import on boot

Hi,

might be the same issue as mentioned here: https://forum.proxmox.com/threads/128738/post-588785
Do you also see the Device not ready; aborting initialisation, CSTS=0x0 message?
Hi, could very well be, but to be honest I do not know if I got that specific message, and I cannot reproduce it anymore to verify as I am at 7.4 now for the time being. I thought I'd just mention my finding in this ticket to maybe help someone else who might be searching too. It is not a Proxmox issue, its a kernel issue. Kind regards Bas
 
I think, there is a known bug in the linux kernel, it affects Lexar NM790 (4TB) SSDs:
https://bugzilla.kernel.org/show_bug.cgi?id=217863

Probably one has to modify and compile the kernel.

Otherwise the bug was reported to be fixed in kernel 6.5.5.
The commit: https://git.kernel.org/pub/scm/linu.../?id=6cc834ba62998c65c42d0c63499bdd35067151ec
is CC-ing stable@vger.kernel.org, so there is a good chance it might come in via the Ubuntu tree for 6.2. Also, the next Proxmox VE point release in Q4 2023 is planned to use kernel 6.5, and there should be testing versions of a 6.5 kernel package released ahead of that (likely in the coming weeks).
 
  • Like
Reactions: senkis
I'm a proxmox newbie and relatively new to really using linux as well. I just did my first real fresh install of PVE a couple of days ago and in the process of looking around I noticed that I seemed to have the same exact issue as the original poster here (@mstefan) described: my second ZFS pool was failing to import, but it was still showing as ONLINE!

This was having the effect that it was showing a failed status on the zfs-import@POOLNAME.service, and this in turn was making my overall system state (by running "systemctl status" with no service name) display as "degraded".

I think I may have a couple of clues as to what may be going on. After reading a bunch of threads, articles, and documentation, I reached the conclusion that since the second pool is already in the cache file and there is another service named "zfs-import-cache.service", this means that "zfs-import@POOLNAME.service" is redundant and should not exist. The zfs-import-cache.service should already be importing all pools that are listed in the cache file, so when zfs-import@POOLNAME.service tries to import the second pool again, it fails because it had already been imported by zfs-import-cache.service. Notice how there is no zfs-import@rpool.service for the root pool.

This would explain why removing the second pool from the cache file, placing it in another cache file, and removing the symlink all seem to resolve the issue. In my case, I tested this with a "systemctl disable zfs-import@POOLNAME.service" and reboot. This also worked.

Finally, I looked around the Web GUI and retraced my steps in setting up this pool. I had first gone to pve -> disks -> zfs -> create: zfs to create the pool, but then I also went to datacenter -> storage -> add -> zfs and added my new pool in there as well. I need to read the actual correct procedure for this to see if I did something wrong but I'm out of time at the moment. For now I removed my pool from the storage list and this removed the zfs-import@POOLNAME.service from the list of services.
 
Last edited:
I'm a proxmox newbie and relatively new to really using linux as well. I just did my first real fresh install of PVE a couple of days ago and in the process of looking around I noticed that I seemed to have the same exact issue as the original poster here (@mstefan) described: my second ZFS pool was failing to import, but it was still showing as ONLINE!

This was having the effect that it was showing a failed status on the zfs-import@POOLNAME.service, and this in turn was making my overall system state (by running "systemctl status" with no service name) display as "degraded".

I think I may have a couple of clues as to what may be going on. After reading a bunch of threads, articles, and documentation, I reached the conclusion that since the second pool is already in the cache file and there is another service named "zfs-import-cache.service", this means that "zfs-import@POOLNAME.service" is redundant and should not exist. The zfs-import-cache.service should already be importing all pools that are listed in the cache file, so when zfs-import@POOLNAME.service tries to import the second pool again, it fails because it had already been imported by zfs-import-cache.service. Notice how there is no zfs-import@rpool.service for the root pool.

This would explain why removing the second pool from the cache file, placing it in another cache file, and removing the symlink all seem to resolve the issue. In my case, I tested this with a "systemctl disable zfs-import@POOLNAME.service" and reboot. This also worked.

Finally, I looked around the Web GUI and retraced my steps in setting up this pool. I had first gone to pve -> disks -> zfs -> create: zfs to create the pool, but then I also went to datacenter -> storage -> add -> zfs and added my new pool in there as well. I need to read the actual correct procedure for this to see if I did something wrong but I'm out of time at the moment. For now I removed my pool from the storage list and this removed the zfs-import@POOLNAME.service from the list of services.
Hi @axes, probably "datacenter -> storage -> add -> zfs" is meant to add some zfs pools that are not part of the existing nodes but just an "independent" storage that can be used by any node. I did not add anything under "datacenter -> storage", my zfs pool completely inside the "pve" node. However, it is listed under "datacenter -> storage".
 
  • Like
Reactions: axes
Hi @axes, probably "datacenter -> storage -> add -> zfs" is meant to add some zfs pools that are not part of the existing nodes but just an "independent" storage that can be used by any node. I did not add anything under "datacenter -> storage", my zfs pool completely inside the "pve" node. However, it is listed under "datacenter -> storage".
Hi @adams13, that's odd. If I hadn't added my second pool in "datacenter -> storage", I don't think it would have shown in there. I need to read more of the documentation and articles on configuring storage in pve. I still think that the zfs-import@POOLNAME.service is redundant though, if the pool is already in the cache file (as I believe it usually is by default). The pool import only needs to be in one, either the cache file to be imported by the cache import service or in its own service, but not both.
 
The commit: https://git.kernel.org/pub/scm/linu.../?id=6cc834ba62998c65c42d0c63499bdd35067151ec
is CC-ing stable@vger.kernel.org, so there is a good chance it might come in via the Ubuntu tree for 6.2. Also, the next Proxmox VE point release in Q4 2023 is planned to use kernel 6.5, and there should be testing versions of a 6.5 kernel package released ahead of that (likely in the coming weeks).
Yes, "4 TB Lexar NM790" does work with the new Proxmox 8.1 having kernel 6.5.11!
:)
 
  • Like
Reactions: axes

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!