[SOLVED] ZFS scrubbing does not work anymore on Kernel 6.14.11 (and cannot upgrade to 6.17.X because of NVidia)

Dulcow

Member
Sep 1, 2023
46
2
13
Hi there,

I'm facing a bit of a situation right now on my NAS running Proxmox 9.1.5
  • Userspace tools have been upgraded to 2.4.0
  • Module is still on 2.3.4 in kernel 6.14.11
  • Upgraded the pools already (I thought it was the issue)
  • Scrub command fails telling to align both
  • I cannot upgrade to 6.17.X because of NVidia
    • Currently pinned 6.14.11 to have GPU passthrough working for AI workloads
Code:
root@pve-nas-2:~# zpool version
zfs-2.4.0-pve1
zfs-kmod-2.3.4-pve1

Code:
root@pve-nas-2:~# zpool scrub dpool
cannot scrub dpool: the loaded zfs module does not support an option for this operation. A reboot may be required to enable this option.

What should I do in such scenario? Downgrade ZFS userspace tools? Will it work if pools have been upgraded already?

Thanks,

D.
 
I solved it by removing the kernel pinning, removing all Nvidia drivers on both host and all LXCs to then move to 6.17 and 580 drivers.

Everything works again, including ZFS scrubbing.
 
to then move to 6.17 and 580 drivers.
I have found myself in this identical situation after a recent drive failure. Can you clarify "what" you moved to 6.17? I am assuming the kernel, but you mentioned several things.

If so, curiously, I'm on kernel 6.17 and ZFS 2.40 and still seeing this error when either ZFS scrub or resilver commands are invoked. So, I figured I'd first make sure I understood your resolution correctly.
 
I have found myself in this identical situation after a recent drive failure. Can you clarify "what" you moved to 6.17? I am assuming the kernel, but you mentioned several things.

If so, curiously, I'm on kernel 6.17 and ZFS 2.40 and still seeing this error when either ZFS scrub or resilver commands are invoked. So, I figured I'd first make sure I understood your resolution correctly.
I had both kernels installed but 6.14 was pinned and I was running 550 NVidia drivers.

When moving 6.17, I had to upgrade the NVidia drivers to a version that supports the kernel. When I had a look to this problem upon Proxmox 9 upgrade, there were no drivers supporting 6.17 properly...

Now 580 works fine. I had to uninstall all NVidia drivers and kernel module in all LXC and on Proxmox host before realigning everything on 580.

Nothing rocket science, just time wasted ;)