ZFS and NVMe issues with PVE Kernel 6.17

salvage-this

New Member
Jan 26, 2024
3
1
3
Hi, I am not sure what the best way to submit a report like this would be or if I am digging around in the wrong places, but here goes.

I have been running PVE for the last few years, upgrading since version 7. A few months ago I tried the upgrade to Version 9 a bit after release. After upgrading, I found that disks just dissapeared and ZFS was no longer functional. Figuring that there was a hardware issue that only showed up after the upgrade I found a new temporary solution, then eventually reinstalled PVE 8 from scratch. When I installed PVE8 again, my lost disks showed back up without issue. I figured PVE 9 needed a bit more time to address bugs.

Today I tried again to move to PVE 9 and the same issue happened. This is what I am seeing:

NVMe:

I have a 1tb single disk that I believe is an LVM Thin pool. It still shows up in the lvm configuration and I can see the disk in lsblk, albeit marked as 0 bytes

Bash:
root@proxmox:~# ^C
root@proxmox:~# lsblk
NAME                 MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                    8:0    0 232.9G  0 disk
├─sda1                 8:1    0  1007K  0 part
├─sda2                 8:2    0     1G  0 part /boot/efi
└─sda3                 8:3    0 231.9G  0 part
  ├─pve-swap         252:0    0     8G  0 lvm  [SWAP]
  ├─pve-root         252:1    0    68G  0 lvm  /
  ├─pve-data_tmeta   252:2    0   1.4G  0 lvm
  │ └─pve-data-tpool 252:4    0 137.1G  0 lvm
  │   └─pve-data     252:5    0 137.1G  1 lvm
  └─pve-data_tdata   252:3    0 137.1G  0 lvm
    └─pve-data-tpool 252:4    0 137.1G  0 lvm
      └─pve-data     252:5    0 137.1G  1 lvm
sdb                    8:16   0   1.5T  0 disk
├─sdb1                 8:17   0   1.5T  0 part
└─sdb9                 8:25   0     8M  0 part
sdc                    8:32   0   1.5T  0 disk
├─sdc1                 8:33   0   1.5T  0 part
└─sdc9                 8:41   0     8M  0 part
sdd                    8:48   0   1.8T  0 disk
├─sdd1                 8:49   0   1.8T  0 part
└─sdd9                 8:57   0     8M  0 part
sde                    8:64   0   1.8T  0 disk
├─sde1                 8:65   0   1.8T  0 part
└─sde9                 8:73   0     8M  0 part
nvme0n1              259:0    0     0B  0 disk
root@proxmox:~# smartctl -a /dev/nvme0n1
smartctl 7.4 2024-10-15 r5620 [x86_64-linux-6.17.4-2-pve] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error

on ZFS, all Disks are present and showing correctly as block devices, but any attempt to interface with ZFS gives this error
Bash:
root@proxmox:~# zpool list
Failed to initialize the libzfs library.

Looking around at the systemd services trying to get any more information, I see that the zfs-zed service is in a failure state given an unmet condition
Bash:
○ zfs-zed.service - ZFS Event Daemon (zed)
     Loaded: loaded (/usr/lib/systemd/system/zfs-zed.service; enabled; preset: enabled)
     Active: inactive (dead)
  Condition: start condition unmet at Sun 2026-01-11 19:22:55 CST; 7min ago
             └─ ConditionPathIsDirectory=/sys/module/zfs was not met
       Docs: man:zed(8)

Jan 11 19:22:55 proxmox systemd[1]: zfs-zed.service - ZFS Event Daemon (zed) was skipped because of an unmet condition check (ConditionPathIsDirectory=/sys/module/zfs).

I think that at least the runtime parameters for zfs are not being initialized properly or that the module is failing to load. I'm still pretty new at troubleshooting this type of stuff so I might be off.

Being that this server had just run an upgrade from PVE version 8 to 9, I had an older kernel available. I manually pinned the Kernel to version 6.8.12-17-pve
Bash:
root@proxmox:~# proxmox-boot-tool kernel pin 6.8.12-17-pve
Setting '6.8.12-17-pve' as grub default entry and running update-grub.
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.17.4-2-pve
Found initrd image: /boot/initrd.img-6.17.4-2-pve
Found linux image: /boot/vmlinuz-6.8.12-17-pve
Found initrd image: /boot/initrd.img-6.8.12-17-pve
Found linux image: /boot/vmlinuz-6.8.12-9-pve
Found initrd image: /boot/initrd.img-6.8.12-9-pve
Found memtest86+ 64bit EFI image: /boot/memtest86+x64.efi
Found memtest86+ 32bit EFI image: /boot/memtest86+ia32.efi
Found memtest86+ 64bit image: /boot/memtest86+x64.bin
Found memtest86+ 32bit image: /boot/memtest86+ia32.bin
Adding boot menu entry for UEFI Firmware Settings ...
done

After setting the latest kernel that I have from PVE 8 and rebooting, my NVMe and ZFS storage all came back healthy without any other configurations needing adjustments.

I also found this post stating that for older chipsets (Z97) moving to Kernel 6.14 might yield better results - https://forum.proxmox.com/threads/t...-server-and-get-lots-of-disk-io-error.178268/ It does, ZFS now works, but the NVMe drive still fails to initialize.

What I am curious about is what I should do next? Kernel version 6.8.12 is not provided in the Proxmox repositories for Trixie. ZFS is on version zfs-2.3.4-pve1 upgraded from version 2.2.3 in the Bookworm repositories.

-- edit --
Adding pveversion -v from other threads

Code:
# pveversion -v
proxmox-ve: 9.1.0 (running kernel: 6.8.12-17-pve)
pve-manager: 9.1.4 (running version: 9.1.4/5ac30304265fbd8e)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.4-2-pve-signed: 6.17.4-2
proxmox-kernel-6.17: 6.17.4-2
proxmox-kernel-6.14.11-5-pve-signed: 6.14.11-5
proxmox-kernel-6.14: 6.14.11-5
proxmox-kernel-6.8: 6.8.12-17
proxmox-kernel-6.8.12-17-pve-signed: 6.8.12-17
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.4.1-1+pve1
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20251111.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.5
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.4
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.4
libpve-rs-perl: 0.11.4
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.1-1
proxmox-backup-file-restore: 4.1.1-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.5
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.1.0
pve-i18n: 3.6.6
pve-qemu-kvm: 10.1.2-5
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.3
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1
root@proxmox:~#
 
Last edited: