NVMe passthrough to TrueNAS - Failed to start zfs-import-scan.service

rtorres

Member
Apr 3, 2024
60
11
8
34
Stockton, CA
Hello all,


I have an NVMe that I passed through to a TrueNAS VM as a sole network storage device. After passing through via Proxmox and configuring TrueNAS to use the drive as a NAS, I have seen the 'Failed to start zfs-import-scan.service - Import ZFS pools by device scanning.' error after every boot.

What can I do to resolve the error message?

Thank you in advance! :)


root@elitemini:~# systemctl status zfs-import-scan.service
× zfs-import-scan.service - Import ZFS pools by device scanning
Loaded: loaded (/lib/systemd/system/zfs-import-scan.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Sat 2024-06-01 01:47:39 PDT; 21h ago
Docs: man:zpool(8)
Process: 762 ExecStart=/sbin/zpool import -aN -d /dev/disk/by-id -o cachefile=none $ZPOOL_IMPORT_OPTS (code=exited, status=1/FAILURE)
Main PID: 762 (code=exited, status=1/FAILURE)
CPU: 27ms

Jun 01 01:47:39 elitemini systemd[1]: Starting zfs-import-scan.service - Import ZFS pools by device scanning...
Jun 01 01:47:39 elitemini zpool[762]: cannot import 'Elite Mini NAS Drive': pool was previously in use from another system.
Jun 01 01:47:39 elitemini zpool[762]: Last accessed by truenas (hostid=141d4132) at Sat Jun 1 01:46:39 2024
Jun 01 01:47:39 elitemini zpool[762]: The pool can be imported, use 'zpool import -f' to import the pool.
Jun 01 01:47:39 elitemini zpool[762]: cannot import 'boot-pool': pool was previously in use from another system.
Jun 01 01:47:39 elitemini zpool[762]: Last accessed by <unknown> (hostid=0) at Wed Dec 31 16:00:00 1969
Jun 01 01:47:39 elitemini zpool[762]: The pool can be imported, use 'zpool import -f' to import the pool.
Jun 01 01:47:39 elitemini systemd[1]: zfs-import-scan.service: Main process exited, code=exited, status=1/FAILURE
Jun 01 01:47:39 elitemini systemd[1]: zfs-import-scan.service: Failed with result 'exit-code'.
Jun 01 01:47:39 elitemini systemd[1]: Failed to start zfs-import-scan.service - Import ZFS pools by device scanning.
 
Did you use PCIe passthrough? Try hiding the NVMe from Proxmox by early binding to vfio-pci (and you might need a softdep to make sure the vfio-pci driver loads before nvme driver). This would be very similar to making sure a GPU is not touched by Proxmox before starting the VM with passthrough. (Don't blacklist nvme if you have other NVMe drives!)
 
Last edited:
Did you use PCIe passthrough? Try hiding the NVMe from Proxmox by early binding to vfio-pci (and you might need a softdep to make sure the vfio-pci driver loads before nvme driver). This would be very similar to making sure a GPU is not touched by Proxmox before starting the VM with passthrough. (Don't blacklist nvme if you have other NVMe drives!)
Thank you for your ultra fast response!

I did do it through PCIe passthrough to the VM via this command:
qm set 102 -scsi1 /dev/disk/by-id/nvme-model-and-serial-here

and made an edit to /etc/pve/qemu-server/102.conf to add serial # (since per video it is a requirement for trueNAS to have a serial number on the drive)

Considering I am very new to passthrough, what would be the command to early binding to vfio-pci?

I do have another NVMe in the system that Proxmox boots from. Would this also be affected?

Thanks again!
 
I did do it through PCIe passthrough to the VM via this command:
qm set 102 -scsi1 /dev/disk/by-id/nvme-model-and-serial-here
That is NOT PCIe passthrough, that is disk passthrough. Ignore my previous suggestions. I don't know how to help in this case. (PCIe passthrough comes with lots of caveats, so I don't recommend it either.)
 
Last edited:
  • Like
Reactions: rtorres
I know this is an old thread, but did anyone have a solution?
I am in the exact same situation... Followed guides on setting up TrueNas on Proxmox with Disk Passthrough, and it worked fine - until I reeboted the proxmox server...
With so many guides out there all describing the same approach i think its wird noone have mentioned or resolved it earlier

When it comes to this error the whole system halts, its not booting up
 

Technical Write-Up: Host Crash/Lockup During Early Boot with HBA Passthrough​

The Problem

When passing an LSI/Broadcom HBA (e.g., SAS2008 or similar IT-mode card) through to a storage VM (like TrueNAS) on a Proxmox VE host, the host may lock up entirely, freeze, or throw a clean kernel panic during early boot.

This issue is highly elusive because it is frequently misdiagnosed as an IOMMU hardware fault, an ACS override issue, or failing HBA silicon. In reality, it is a software-driven hardware race condition triggered by OpenZFS host utilities.

The Architecture & Root Cause

The panic occurs inside a narrow timing window during the early Linux boot sequence, resulting from the interaction of three distinct layers:

  1. Deterministic Device Mapping (sdX Assignment):

    During early boot, the host kernel initializes the physical HBA and maps its attached drives chronologically. If the Proxmox boot disk sits alphabetically behind the HBA drives (e.g., the HBA claims /dev/sda through /dev/sdh, and the host boot drive is assigned /dev/sdi), the kernel must maintain structural awareness of those HBA target blocks.
  2. The OpenZFS Raw Binary Probe:

    By default, Proxmox enables zfs-import-scan.service. This is a systemd Type=oneshot service that directly calls the compiled C binary utility: /sbin/zpool import -aN -o cachefile=none.

    Because -a instructs a blind sweep, the zpool utility queries the kernel's raw block tracking layer (libblkid//sys/block). It iterates through every single registered storage device—including the HBA’s early-mapped /dev/sda-sdh raw blocks—and pokes the disks to inspect them for ZFS metadata headers.
  3. The VFIO Handoff Conflict (The Deadlock):

    While the zpool import binary is actively executing its low-level hardware loop and probing the registers of the HBA-attached disks, the hypervisor's vfio-pci stub driver concurrently attempts to forcefully detach the parent HBA controller from the host OS to isolate it for the VM.
The Crash Mechanism: Proxmox's storage layer is aggressively probing raw devices at the exact millisecond the hypervisor tries to tear down the controller infrastructure. The driver state enters a split-brain deadlock, the hardware registers lock up, and the host kernel panics.

Why udev Rules and Symlink Disguises Fail

A common forum recommendation is to use a custom udev rule targeting the HBA's PCIe address (0000:01:00.0) to mask the drives or wipe out their environmental serials (ENV{ID_SERIAL}="") inside /dev/disk/by-id/.

This does not work. OpenZFS does not rely purely on the visual string paths inside /dev/disk/by-id/. The zpool import scanning routine uses low-level system libraries to map disk topology. It reads straight through user-space symlinks directly down to the underlying kernel block descriptors (sda-sdh). If the blocks exist, the binary loop will poke them.

The Ultimate Solution

OpenZFS lacks a native exclusion configuration file (such as a /etc/zfs/zimport.ignore file that reads PCIe paths or topologies). Therefore, the only way to break this race condition is to completely deny the compiled binary utility the ability to scan the host layer.

If your host does not rely on a blind storage scan to mount its primary OS pools, run the following commands on the Proxmox host to permanently disable the auto-discovery services:

Bash:
systemctl disable --now zfs-import-scan.service
systemctl disable --now zfs-import-cache.service

How to Handle Host ZFS Pools Safely Moving Forward (I have not verified THIS part)

If you have other native host ZFS pools (separate from the passed-through HBA), do not rely on the blind scan service. Instead, ensure they are explicitly tracked via the local cache ledger. Ensure your host pools have their cachefile property enabled:

Bash:
zpool set cachefile=/etc/zfs/zpool.cache <host-pool-name>

This forces the host to mount its own local storage strictly by reading a defined, static map file, completely preventing the binary scanner from ever waking up and probing the VM's hardware paths on boot.

Why This Surfaces Post-Upgrading to Proxmox VE 9+​

This issue frequently lays dormant for months or years on older infrastructure, only to violently surface the moment an administrator upgrades a cluster node to Proxmox VE 9+.

The major release transition brings updated upstream systemd management libraries and a jump to the newer Linux kernel branches (>=7), which significantly alter early boot optimization and initial hardware discovery timings. When the newer kernel parallelizes driver initialization, it inadvertently alters the exact millisecond vfio-pci claims the HBA relative to when systemd releases the oneshot binary loop for zfs-import-scan.service.

Furthermore, major system distribution upgrades often reset vendor presets, quietly re-enabling or unmasking legacy OpenZFS discovery services that the user had previously tampered with or assumed were dead.

The result is a sudden, perplexing "post-upgrade regression" where a perfectly stable machine running PVE 8.x suddenly enters an inescapable boot-loop or hard freeze on PVE 9+, tricking the sysadmin into believing the new kernel version simply has broken physical drivers for their LSI card.