ZFS rpool isn't imported automatically after update from PVE9.0.10 to latest PVE9.1 and kernel 6.17 with viommu=virtio

VictorSTS

Distinguished Member
Oct 7, 2019
1,127
640
158
Spain
On one of my training labs, I have a series of training VMs running PVE with nested virtualization. These VM has two disks in a ZFS mirror for the OS, UEFI, secure boot disabled, use systemd-boot (no grub). VM uses machine: q35,viommu=virtio for PCI passthrough explanation and webUI walkthrough. I'm testing upgrading these PVE VMs from PVE 9.0.10 to latest PVE9.1 using no-subscription repository.

Current version is:
Code:
proxmox-ve: 9.0.0 (running kernel: 6.14.11-2-pve)
pve-manager: 9.0.10 (running version: 9.0.10/deb1ca707ec72a89)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
proxmox-kernel-6.14: 6.14.11-2
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
amd64-microcode: 3.20250311.1
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.10
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.8
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.15-1
proxmox-backup-file-restore: 4.0.15-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.2
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.2
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.13
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-4
pve-ha-manager: 5.0.4
pve-i18n: 3.6.0
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.22
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve2
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

After apt update + apt dist-upgrade + reboot, the VM tries to boot with kernel 6.17.2-2-pve. I eventually drops to initramfs complaining about not being able to import ZFS pool 'rpool':

1764260037386.png

At that point, if I just do zpool import rpool, the pool is imported correctly. Issuing an exit, drops out from emergency shell and PVE VM boots correctly.
If I boot with kernel 6.14.11-4-pve it does boot correctly.

Checking on screen output, seems as if kernel 6.17 takes too long to init disk controller and/or disks:

Code:
[    1.394364] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input3
[    1.529064] raid6: avx512x4 gen() 45744 MB/s
[    1.546064] raid6: avx512x2 gen() 48837 MB/s
[    1.563065] raid6: avx512x1 gen() 44950 MB/s
[    1.580064] raid6: avx2x4   gen() 50702 MB/s
[    1.597064] raid6: avx2x2   gen() 50377 MB/s
[    1.614064] raid6: avx2x1   gen() 42054 MB/s
[    1.614259] raid6: using algorithm avx2x4 gen() 50702 MB/s
[    1.631065] raid6: .... xor() 5329 MB/s, rmw enabled
[    1.631251] raid6: using avx512x2 recovery algorithm
[    1.632638] xor: automatically using best checksumming function   avx
[    1.699734] Btrfs loaded, zoned=yes, fsverity=yes
[    1.718410] spl: loading out-of-tree module taints kernel.
[    1.745494] zfs: module license 'CDDL' taints kernel.
[    1.746085] Disabling lock debugging due to kernel taint
[    1.746380] zfs: module license taints kernel.
[    2.189144] ZFS: Loaded module v2.3.4-pve1, ZFS pool version 5000, ZFS filesystem version 5
[   11.809165] pcieport 0000:00:1c.0: deferred probe timeout, ignoring dependency
[   11.813220] pcieport 0000:00:1c.0: PME: Signaling with IRQ 24
[   11.813971] pcieport 0000:00:1c.0: AER: enabled with IRQ 24
[   11.814988] pcieport 0000:00:1c.1: deferred probe timeout, ignoring dependency
[   11.817551] pcieport 0000:00:1c.1: PME: Signaling with IRQ 25
[   11.818197] pcieport 0000:00:1c.1: AER: enabled with IRQ 25
[   11.819122] pcieport 0000:00:1c.2: deferred probe timeout, ignoring dependency
[   11.821933] pcieport 0000:00:1c.2: PME: Signaling with IRQ 26
[   11.822533] pcieport 0000:00:1c.2: AER: enabled with IRQ 26
[   11.823566] pcieport 0000:00:1c.3: deferred probe timeout, ignoring dependency
[   11.825508] pcieport 0000:00:1c.3: PME: Signaling with IRQ 27
[   11.826150] pcieport 0000:00:1c.3: AER: enabled with IRQ 27
[   11.827242] pcieport 0000:00:1e.0: deferred probe timeout, ignoring dependency
[   11.827569] pcieport 0000:05:01.0: deferred probe timeout, ignoring dependency
[   11.827904] shpchp 0000:05:01.0: deferred probe timeout, ignoring dependency

If I try to zpool import rpool before kernel 6.17 inits disks, no rpool is found.

Seems there's some incompatibility with machine: q35,viommu=virtio and kernel 6.17. I've attached the full dmesg log of a failure.

EDIT: with machine: q35,viommu=intel VM boots correctly and I can use IOMMU on the nested VMs for training purposes.
 

Attachments

Last edited:
  • Like
Reactions: waltar
I am having the same issue on 2 of my 3 proxmox servers bare metal, this started on 6.17.2-1 and i was hoping 6.17.2-2-pve would resolve it. If I use 6.14.11-4-pve no issues. unnamed.png
 
Last edited:
I though it would be related to nested virtualization / virtio vIOMMU. Haven't seen any issue with bare metal yet.
Can you manually import the pool and continue boot once disks are detected (zpool import rpool)?
 
ok I think my issue is not the same as your now I dove deeper into it, The rpool is not showing up because the kernal is not negotiating the sata device correctly and disabling the device. I think this is a kernal bug still since rolling back to 6.14 there are no issues. I think I need to open a new thread for this?unnamed (1).png
 
Definitely not the same issue, even if the symptom is the same. I remember having somewhat similar issue on some Dell long ago (AFAIR it was when PVE7.0 came out) and enabling all of X2APIC, IOMMU and SRIOV on BIOS + BIOS update solved it at the time.
 
Just in case somebody runs across this, I was able to fix it by doing this. I made the mistake of updating in nano /etc/default/grub and not the correct location /etc/kernel/cmdline
I added quiet intel_iommu=on iommu=pt

nano /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt
proxmox-boot-tool refresh

To check what is currently running use this
cat /proc/cmdline
 
Last edited:
  • Like
Reactions: tre4B and Onslow
Mine is doing this too. I've tried the above but no dice, still stops at the initramfs and makes me type zpool import rpool, after which an exit makes it boot perfectly. I try to fix this every now and again but it has been doing it for a while now. Any other ideas that might fix it or give me another line to try and debug would be greatfully received.