Proxmox 8 to 9 - NFS on ZFS not working anymore

Jannoke

Renowned Member
Jul 13, 2016
66
11
73
Have 4 node setup (1 node is just a miniPC for management).
one node has ZFS volume with shared folder over NFS that other nodes use.
After upgrade other nodes cannot access the NFS anymore. They kinda mount it and show files with "?" in beginning. But i cant read or write on volume.

GUI gives:
Code:
mkdir /mnt/pve/BackupData35: File exists at /usr/share/perl5/PVE/Storage/Plugin.pm line 2479. (500)

I'm guessing it got to do something with zfs and exports.
 
Have 4 node setup (1 node is just a miniPC for management).
one node has ZFS volume with shared folder over NFS that other nodes use.
After upgrade other nodes cannot access the NFS anymore. They kinda mount it and show files with "?" in beginning. But i cant read or write on volume.

GUI gives:
Code:
mkdir /mnt/pve/BackupData35: File exists at /usr/share/perl5/PVE/Storage/Plugin.pm line 2479. (500)

I'm guessing it got to do something with zfs and exports.

I may be having a similar issue as you. Do you know if the nfs-server.service is running on the host node of the shared folder?
 
Last edited:
Hi,
anything interesting in the system journal of the node doing the export or the client node? Did you disable the storage and unmount the NFS on the client nodes before doing the upgrade for the NFS server node?
 
I did not unmount the NFS for the upgrade (didn't see that in the guide). For one node (host2) it had no issues and the other node (host3) does have an issue.

journalctl -xeu nfs-server
Code:
A stop job for unit nfs-server.service has finished.

The job identifier is 5195 and the job result is done.
Sep 17 18:37:10 host3 systemd[1]: Starting nfs-server.service - NFS server and services...
Subject: A start job for unit nfs-server.service has begun execution
Defined-By: systemd
Support: https://www.debian.org/support

A start job for unit nfs-server.service has begun execution.

The job identifier is 5968.
Sep 17 18:37:10 host3 sh[336590]: nfsdctl: lockd configuration failure
Sep 17 18:37:10 host3 sh[336591]: rpc.nfsd: unable to bind AF_INET TCP socket: errno 98 (Address already in use)
Sep 17 18:37:10 host3 sh[336591]: rpc.nfsd: unable to set any sockets for nfsd
Sep 17 18:37:10 host3 systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Subject: Unit process exited
Defined-By: systemd
Support: https://www.debian.org/support

An ExecStart= process belonging to unit nfs-server.service has exited.

The process' exit code is 'exited' and its exit status is 1.
Sep 17 18:37:10 host3 systemd[1]: nfs-server.service: Failed with result 'exit-code'.
Subject: Unit failed
Defined-By: systemd
Support: https://www.debian.org/support

The unit nfs-server.service has entered the 'failed' state with result 'exit-code'.
Sep 17 18:37:10 host3 systemd[1]: Stopped nfs-server.service - NFS server and services.
Subject: A stop job for unit nfs-server.service has finished
Defined-By: systemd
Support: https://www.debian.org/support

A stop job for unit nfs-server.service has finished.

The job identifier is 6205 and the job result is done.
 
I have now tested some things.
1) all my nodes have been restarted
3) it mounts, but it shows all contents with ???:

Bash:
root@blake:/mnt/pve/BackupData35# ls -lah
ls: cannot access 'template': Stale file handle
ls: cannot access 'longtime': Stale file handle
ls: cannot access 'backup': Stale file handle
ls: cannot access 'dump': Stale file handle


??????????  ? ?    ?       ?            ? template
??????????  ? ?    ?       ?            ? longtime
??????????  ? ?    ?       ?            ? backup
??????????  ? ?    ?       ?            ? dump

So as this mount is on zfs i understand that it needs permissions on the dataset. Problem was that shared folder was not a dataset root, but subdirectory on dataset. So solution was to either make this subfolder a dataset or export root of the zfs volume. And to make this automatic, i could just set "zfs set sharenfs=on" on the dataset and it would then make export automatically and it's fixed.

Why now:
It seems proxmox 9 now uses v4 nfs by default that don't support using subdirectories as share exports as a thing like "fsid" is tied only to root folder of media. So things need to be on /. Fix is to convert this subfolder to dataset and then it will export and work correctly, as then shared folder is on root of dataset.
On v3 this is not recommended, but can work without fsid. So forcing connection on server side to v3 would have made my setup work in "legacy" mode. I understand this is slower (especially on small io) and overall worse and it just works because of less strict rules.

So for me all is good with the mount problem. I can leave it unsolved for some time until @aldariz can get it's things resolved. If it's not resolved in reasonable time I will mark it as solved.
 
  • Like
Reactions: fiona
I tried unmounting all the mount points on the nodes and doing on the them; however, that made no change.

ss -taupen (removed IP addresses)
Code:
Netid         State          Recv-Q         Send-Q                 Local Address:Port                  Peer Address:Port         Process                                                                                                                                                                                                                                                           
udp           UNCONN         0              0                            0.0.0.0:5353                       0.0.0.0:*             users:(("avahi-daemon",pid=1183,fd=12))                                                                                          uid:112 ino:12627 sk:1 cgroup:/system.slice/avahi-daemon.service <->                                                            
udp           UNCONN         0              0                         0.0.0.0:5405                       0.0.0.0:*             users:(("corosync",pid=1503,fd=28))                                                                                              ino:12818 sk:2 cgroup:/system.slice/corosync.service <->                                                                        
udp           UNCONN         0              0                            0.0.0.0:48477                      0.0.0.0:*             users:(("rpc.statd",pid=1298,fd=8))                                                                                              uid:109 ino:8090 sk:3 cgroup:/system.slice/nfs-ganesha-lock.service <->                                                         
udp           UNCONN         0              0                            0.0.0.0:111                        0.0.0.0:*             users:(("rpcbind",pid=1014,fd=5),("systemd",pid=1,fd=44))                                                                        ino:6286 sk:4 cgroup:/system.slice/rpcbind.socket <->                                                                           
udp           UNCONN         0              0                          0.0.0.0:323                        0.0.0.0:*             users:(("chronyd",pid=1271,fd=5))                                                                                                ino:14389 sk:5 cgroup:/system.slice/chrony.service <->                                                                          
udp           UNCONN         0              0                          0.0.0.0:626                        0.0.0.0:*             users:(("rpc.statd",pid=1298,fd=5))                                                                                              ino:12649 sk:6 cgroup:/system.slice/nfs-ganesha-lock.service <->                                                                
udp           UNCONN         0              0                            0.0.0.0:33895                      0.0.0.0:*             users:(("avahi-daemon",pid=1183,fd=14))                                                                                          uid:112 ino:12629 sk:7 cgroup:/system.slice/avahi-daemon.service <->                                                            
udp           UNCONN         0              0                                  *:2049                             *:*             users:(("ganesha.nfsd",pid=1333,fd=11))                                                                                          ino:17508 sk:8 cgroup:/system.slice/nfs-ganesha.service v6only:0 <->                                                            
udp           UNCONN         0              0                               [::]:5353                          [::]:*             users:(("avahi-daemon",pid=1183,fd=13))                                                                                          uid:112 ino:12628 sk:9 cgroup:/system.slice/avahi-daemon.service v6only:1 <->                                                   
udp           UNCONN         0              0                               [::]:46928                         [::]:*             users:(("rpc.statd",pid=1298,fd=10))                                                                                             uid:109 ino:12016 sk:a cgroup:/system.slice/nfs-ganesha-lock.service v6only:1 <->                                               
udp           UNCONN         0              0                                  *:32774                            *:*             users:(("ganesha.nfsd",pid=1333,fd=15))                                                                                          ino:17512 sk:b cgroup:/system.slice/nfs-ganesha.service v6only:0 <->                                                            
udp           UNCONN         0              0                               [::]:111                           [::]:*             users:(("rpcbind",pid=1014,fd=7),("systemd",pid=1,fd=46))                                                                        ino:3487 sk:c cgroup:/system.slice/rpcbind.socket v6only:1 <->                                                                  
udp           UNCONN         0              0                              [::1]:323                           [::]:*             users:(("chronyd",pid=1271,fd=6))                                                                                                ino:14390 sk:d cgroup:/system.slice/chrony.service v6only:1 <->                                                                 
udp           UNCONN         0              0                               [::]:57968                         [::]:*             users:(("avahi-daemon",pid=1183,fd=15))                                                                                          uid:112 ino:12630 sk:e cgroup:/system.slice/avahi-daemon.service v6only:1 <->                                                   
udp           UNCONN         0              0                                  *:33622                            *:*             users:(("ganesha.nfsd",pid=1333,fd=13))                                                                                          ino:17510 sk:f cgroup:/system.slice/nfs-ganesha.service v6only:0 <->                                                            
udp           UNCONN         0              0                                  *:875                              *:*             users:(("ganesha.nfsd",pid=1333,fd=17))                                                                                          ino:17514 sk:10 cgroup:/system.slice/nfs-ganesha.service v6only:0 <->                                                           
tcp           LISTEN         0              4096                         0.0.0.0:36441                      0.0.0.0:*             users:(("rpc.statd",pid=1298,fd=9))                                                                                              uid:109 ino:8094 sk:11 cgroup:/system.slice/nfs-ganesha-lock.service <->                                                        
tcp           LISTEN         0              4096                         0.0.0.0:111                        0.0.0.0:*             users:(("rpcbind",pid=1014,fd=4),("systemd",pid=1,fd=43))                                                                        ino:992 sk:12 cgroup:/system.slice/rpcbind.socket <->                                                                           
tcp           LISTEN         0              128                          0.0.0.0:22                         0.0.0.0:*             users:(("sshd",pid=1331,fd=6))                                                                                                   ino:15398 sk:13 cgroup:/system.slice/ssh.service <->                                                                            
tcp           LISTEN         0              100                        0.0.0.0:25                         0.0.0.0:*             users:(("master",pid=1487,fd=13))                                                                                                ino:15432 sk:14 cgroup:/system.slice/postfix.service <->                                                                        
tcp           LISTEN         0              4096                       0.0.0.0:85                         0.0.0.0:*             users:(("pvedaemon worke",pid=1623,fd=6),("pvedaemon worke",pid=1622,fd=6),("pvedaemon worke",pid=1621,fd=6),("pvedaemon",pid=1620,fd=6)) ino:12075 sk:15 cgroup:/system.slice/pvedaemon.service <->                                                                      
tcp           ESTAB          0              28                        0.0.0.0:22                      0.0.0.0:40374         users:(("sshd-session",pid=2552,fd=7),("sshd-session",pid=2517,fd=7))                                                            timer:(on,218ms,0) ino:23611 sk:16 cgroup:/system.slice/ssh.service <->                                                         
tcp           LISTEN         0              4096                               *:8006                             *:*             users:(("pveproxy worker",pid=1667,fd=6),("pveproxy worker",pid=1666,fd=6),("pveproxy worker",pid=1665,fd=6),("pveproxy",pid=1664,fd=6)) uid:33 ino:12933 sk:17 cgroup:/system.slice/pveproxy.service v6only:0 <->                                                       
tcp           LISTEN         0              100                            [::1]:25                            [::]:*             users:(("master",pid=1487,fd=14))                                                                                                ino:15433 sk:18 cgroup:/system.slice/postfix.service v6only:1 <->                                                               
tcp           LISTEN         0              4096                               *:37183                            *:*             users:(("ganesha.nfsd",pid=1333,fd=14))                                                                                          ino:17511 sk:19 cgroup:/system.slice/nfs-ganesha.service v6only:0 <->                                                           
tcp           LISTEN         0              4096                               *:45873                            *:*             users:(("ganesha.nfsd",pid=1333,fd=16))                                                                                          ino:17513 sk:1a cgroup:/system.slice/nfs-ganesha.service v6only:0 <->                                                           
tcp           LISTEN         0              4096                               *:2049                             *:*             users:(("ganesha.nfsd",pid=1333,fd=12))                                                                                          ino:17509 sk:1b cgroup:/system.slice/nfs-ganesha.service v6only:0 <->                                                           
tcp           LISTEN         0              4096                            [::]:59637                         [::]:*             users:(("rpc.statd",pid=1298,fd=11))                                                                                             uid:109 ino:12020 sk:1c cgroup:/system.slice/nfs-ganesha-lock.service v6only:1 <->                                              
tcp           LISTEN         0              4096                               *:3128                             *:*             users:(("spiceproxy work",pid=1674,fd=6),("spiceproxy",pid=1673,fd=6))                                                           uid:33 ino:13911 sk:1d cgroup:/system.slice/spiceproxy.service v6only:0 <->                                                     
tcp           LISTEN         0              4096                            [::]:111                           [::]:*             users:(("rpcbind",pid=1014,fd=6),("systemd",pid=1,fd=45))                                                                        ino:3486 sk:1e cgroup:/system.slice/rpcbind.socket v6only:1 <->                                                                 
tcp           LISTEN         0              128                             [::]:22                            [::]:*             users:(("sshd",pid=1331,fd=7))                                                                                                   ino:15400 sk:1f cgroup:/system.slice/ssh.service v6only:1 <->                                                                   
tcp           LISTEN         0              4096                               *:875                              *:*             users:(("ganesha.nfsd",pid=1333,fd=18))                                                                                          ino:17515 sk:20 cgroup:/system.slice/nfs-ganesha.service v6only:0 <->

I tried to see if stopping/killing the rpc.statd would work, but proxmox just restarts it immediately, and it seems to be linked to nfs-ganesha-lock.service as in the logs this service deactivates and then restarts.

Code:
Sep 18 19:25:05 host3 systemd[1]: nfs-ganesha-lock.service: Deactivated successfully.
Sep 18 19:25:21 host3 systemd[1]: auth-rpcgss-module.service - Kernel Module supporting RPCSEC_GSS was skipped because of an unmet condition check (ConditionPathExists=/etc/krb5.keytab).
Sep 18 19:25:21 host3 systemd[1]: Starting nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking....
Sep 18 19:25:21 host3 systemd[1]: Starting nfs-idmapd.service - NFSv4 ID-name mapping service...
Sep 18 19:25:21 host3 systemd[1]: Starting nfs-mountd.service - NFS Mount Daemon...
Sep 18 19:25:21 host3 systemd[1]: rpc-gssd.service - RPC security service for NFS client and server was skipped because of an unmet condition check (ConditionPathExists=/etc/krb5.keytab).
Sep 18 19:25:21 host3 systemd[1]: Starting rpc-statd.service - NFS status monitor for NFSv2/3 locking....
.
 
So I was able to stop nfs-ganesha-lock.service and was able to start nfs-server.service and the NFS mounts are running correctly. Just need to find out why nfs-ganesha-lock.service is running as the default.
 
  • Like
Reactions: fiona
So I was able to stop nfs-ganesha-lock.service and was able to start nfs-server.service and the NFS mounts are running correctly. Just need to find out why nfs-ganesha-lock.service is running as the default.
It might be that the service was installed/enabled during the upgrade? You could check /var/log/apt/term.log (or .log.1.gz, .log.2.gz, etc. if already rotated) if you find a reference to the service.