Problem with nfsd/nfs-client with kernel 6.17.4-2

DirkH

Renowned Member
Aug 18, 2015
16
4
68
Hi,

I hit a roadblock updating one of our clusters to PVE9.1.4.
We have a 5 node cluster where one of the machines also has additional disks and shares them via nfs for backups and images. So on this node nfs-kernel-server is installed, the underlying fs is zfs. The mounts are facilitated via a separate network.

After upgrading the NFS-server node all hosts can mount the shares just fine, except the one where the nfsd is running.
It mounts the shares but for any subdirectory I get "stale file handle".
Code:
26-01-23T10:55:11.045452+01:00 proxffm06 pvestatd[2709]: mkdir /mnt/pve/nfs-images/images: Stale file handle at /usr/share/perl5/PVE/Storage/Plugin.pm line 1919.
2026-01-23T10:55:11.068183+01:00 proxffm06 pvestatd[2709]: mkdir /mnt/pve/backup-images/dump: Stale file handle at /usr/share/perl5/PVE/Storage/Plugin.pm line 1919.

The exports reads
Code:
/tank/backup-images     192.168.7.0/255.255.255.0(rw,async,secure,subtree_check,no_root_squash)
/tank/nfs-images        192.168.7.0/255.255.255.0(rw,async,secure,subtree_check,no_root_squash)

while the storage.cfg mounts are
Code:
nfs: backup-images
        export /tank/backup-images
        path /mnt/pve/backup-images
        server proxffmnfs
        content backup
        prune-backups keep-last=1
        options vers=3

nfs: nfs-images
        export /tank/nfs-images
        path /mnt/pve/nfs-images
        server proxffmnfs
        content images,iso
        prune-backups keep-last=1
        options vers=3

trying with nfs vers 4 didn't change the outcome.

As written the problem is solely present on the host that is the nfs-server itself all other cluster hosts PVE9 and PVE8 mount the shares without problems.

This behavior is observed under Kernel 6.17.4-2.
Booting into the old 6.8.12-16-pve fixes the problem.

I scanned some of the newer patch notes for 6.17.x but didn't get a clear hit.

Any suggestions?
 
hi, did you find a solution?

I did an upgrade yesterday and now I am on pve 9.1.5 - kernel 6.17.9-1
(upgrade from latest pve 8, nfs-server was working without headache ;-))

when I try to mount a nfs export via same node OR via other node I'll get stale file handles - looks like that:

in the gui:
Code:
create storage failed: mkdir /mnt/pve/z440nfs/template: Stale file handle at /usr/share/perl5/PVE/Storage/Plugin.pm line 1919. (500)

on cli:
Code:
root@z440:/mnt/pve/z440nfs# ls -al
ls: cannot access 'dump': Stale file handle
ls: cannot access 'template': Stale file handle
total 1
drwxr-xr-x 4 root root 4 11. Feb 13:51 .
drwxr-xr-x 4 root root 4 11. Feb 13:53 ..
?????????? ? ?    ?    ?             ? dump
?????????? ? ?    ?    ?             ? template

the exportet dir looks OK:

Code:
root@z440:/z440nfsvz# ls -al
total 10
drwxr-xr-x  4 root root  4 11. Feb 13:51 .
drwxr-xr-x 19 root root 23 11. Feb 13:34 ..
drwxr-xr-x  2 root root  2 11. Feb 13:34 dump
drwxr-xr-x  2 root root  2 11. Feb 13:51 template

did umount, reboot, new exports, new dirs -> nothing helped
 
Nope, no luck so far.
Node with the NFS-Server runs with the old Kernel for now.
Strange thing is I didn't get the Stale file handle on other nodes only on the one that is itself exporting the shares.
 
  • Like
Reactions: ce3rd
hi, for me it works now with export options (rw,sync,no_root_squash,no_subtree_check) - before I had (rw,no_root_squash,subtree_check)

I already tried that yesterday, but maybe I didnt restart nfs-server afterwards, or the old mount was still active.
so before trying to mount with new options, be sure to check with "mount" if there is still an entry in the list -> if yes - remove it with umount /mnt/pve/xxx
 
  • Like
Reactions: DirkH
Thanks for sharing!
After I could test it some. It seems to be the subtree_check that is causing the problem. I still run async but with no_subtree_check the mounts work with the new kernel.
 
  • Like
Reactions: ce3rd