Can no longer mount shared NFS storage from external device.

akulbe

Member
Jan 1, 2023
55
4
13
Portland, OR
I have a NAS running TrueNAS Scale, and I was doing some maintenance on it, and when I booted it back up, the NFS shares I had previously had working to Proxmox now no longer work.

Code:
root@pve1:~# mount  | grep nfs
172.20.252.1:/mnt/Vault/VM-Windows on /mnt/pve/VM-Windows type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.20.252.4,local_lock=none,addr=172.20.252.1)
172.20.252.1:/mnt/Vault/VM-Linux on /mnt/pve/VM-Linux type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.20.252.4,local_lock=none,addr=172.20.252.1)

It shows them mounted, on the host, and in the PVE console.

But when I attempt to start any VMs that are on that storage, I get

Code:
TASK ERROR: unable to activate storage 'VM-Linux' - directory '/mnt/pve/VM-Linux' does not exist or is unreachable

This doesn't make sense to me, as I can connect from my PVE hosts to the NAS on port 2049.


Code:
root@pve1:~# telnet 172.20.252.1 2049
Trying 172.20.252.1...
Connected to 172.20.252.1.
Escape character is '^]'.
 
Code:
root@pve1:~# pvesm status
got timeout
unable to activate storage 'VM-Linux' - directory '/mnt/pve/VM-Linux' does not exist or is unreachable
got timeout
unable to activate storage 'VM-Windows' - directory '/mnt/pve/VM-Windows' does not exist or is unreachable
Name              Type     Status           Total            Used       Available        %
PBS                pbs     active     22561092608      2094993408     20466099200    9.29%
VM-Linux           nfs   inactive               0               0               0    0.00%
VM-Windows         nfs   inactive               0               0               0    0.00%
local              dir     active      1723664000       280648192      1443015808   16.28%
local-zfs      zfspool     active      1598644472       155628652      1443015820    9.74%


Code:
[  208.786718] nfs: server 172.20.252.1 not responding, still trying
[  232.338462] nfs: server 172.20.252.1 not responding, still trying
[  757.645020] nfs: server 172.20.252.1 OK
[  757.645107] nfs: server 172.20.252.1 OK
[  937.864829] nfs: server 172.20.252.1 not responding, still trying
[  964.488513] nfs: server 172.20.252.1 not responding, still trying
[ 1494.913873] nfs: server 172.20.252.1 not responding, still trying
[ 1765.247937] nfs: server 172.20.252.1 OK
[ 1765.247961] nfs: server 172.20.252.1 OK
[ 1765.248012] nfs: server 172.20.252.1 OK
[ 1945.468125] nfs: server 172.20.252.1 not responding, still trying
[ 1945.468129] nfs: server 172.20.252.1 not responding, still trying
[ 2756.467309] nfs: server 172.20.252.1 OK
[ 2756.467371] nfs: server 172.20.252.1 OK
[ 2936.687450] nfs: server 172.20.252.1 not responding, still trying
[ 2936.687450] nfs: server 172.20.252.1 not responding, still trying
[ 3719.014706] nfs: server 172.20.252.1 OK
[ 3719.014791] nfs: server 172.20.252.1 OK
[ 3719.014991] nfs: server 172.20.252.1 OK
[ 3899.235144] nfs: server 172.20.252.1 not responding, still trying
[ 4710.234251] nfs: server 172.20.252.1 OK
[ 4890.454639] nfs: server 172.20.252.1 not responding, still trying
[ 4890.454644] nfs: server 172.20.252.1 not responding, still trying

I can see /mnt/pve, and when I do an "ls" I see the dirs VM-Linux and VM-Windows, but when I try to "ls" the contents of either of those, the system just hangs.
 
My first impression of the information you provided - you have a network issue. Perhaps an MTU mismatch.
Note that PVE health checks for NFS consists of RPC probing (showmount , rpcinfo). Those often use UDP.

My next step is to use those commands directly and troubleshoot any issues you find.

Code:
sub check_connection {
    my ($class, $storeid, $scfg) = @_;

    my $server = $scfg->{server};
    my $opts = $scfg->{options};
    
    my $cmd;

    my $is_v4 = defined($opts) && $opts =~ /vers=4.*/;
    if ($is_v4) {
        my $ip = PVE::JSONSchema::pve_verify_ip($server, 1);
        if (!defined($ip)) {
            $ip = PVE::Network::get_ip_from_hostname($server);
        }
    
        my $transport = PVE::JSONSchema::pve_verify_ipv4($ip, 1) ? 'tcp' : 'tcp6';

        # nfsv4 uses a pseudo-filesystem always beginning with /
        # no exports are listed
        $cmd = ['/usr/sbin/rpcinfo', '-T', $transport, $ip, 'nfs', '4'];
    } else {
        $cmd = ['/sbin/showmount', '--no-headers', '--exports', $server];
    }
    
    eval {
        run_command($cmd, timeout => 10, outfunc => sub { }, errfunc => sub { });
    };
    if (my $err = $@) {
        if ($is_v4) {
            my $port = 2049;
            $port = $1 if defined($opts) && $opts =~ /port=(\d+)/;

            # rpcinfo is expected to work when the port is 0 (see 'man 5 nfs') and tcp_ping()
            # defaults to port 7 when passing in 0.
            return 0 if $port == 0;

            return PVE::Network::tcp_ping($server, $port, 2);
        }
        return 0;
    }   
            
    return 1;   
}



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I'm not sure what to do with your reply... is that Perl(?) that I should be putting in a file and running, to check?
It is Perl. It is what PVE does to health check NFS. You can look at two CMD= lines and run those commands manually to see if they fail. Then you can troubleshoot why they fail.
MTU is 9000 on both ends, as far as I can tell.
You should try testing that it is actually working by doing a large non-fragmented ICMP ping.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
root@pve1

It looks like you are using a cluster in Proxmox, if so I believe you need to have NFS v3 also enabled in the NAS. See this thread.
(Possibly during you're update in TrueNAS SCALE you changed this?).

Please note: I don't use TrueNAS SCALE so I don't have personal experience with this.