Can no longer mount shared NFS storage from external device.

akulbe · Monday at 17:47

I have a NAS running TrueNAS Scale, and I was doing some maintenance on it, and when I booted it back up, the NFS shares I had previously had working to Proxmox now no longer work.

Code:

root@pve1:~# mount  | grep nfs
172.20.252.1:/mnt/Vault/VM-Windows on /mnt/pve/VM-Windows type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.20.252.4,local_lock=none,addr=172.20.252.1)
172.20.252.1:/mnt/Vault/VM-Linux on /mnt/pve/VM-Linux type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.20.252.4,local_lock=none,addr=172.20.252.1)

It shows them mounted, on the host, and in the PVE console.

But when I attempt to start any VMs that are on that storage, I get

Code:

TASK ERROR: unable to activate storage 'VM-Linux' - directory '/mnt/pve/VM-Linux' does not exist or is unreachable

This doesn't make sense to me, as I can connect from my PVE hosts to the NAS on port 2049.

Code:

root@pve1:~# telnet 172.20.252.1 2049
Trying 172.20.252.1...
Connected to 172.20.252.1.
Escape character is '^]'.

akulbe · Monday at 17:47

I have rebooted both NAS and hosts, multiple times. It hasn't changed anything.

bbgeek17 · Monday at 18:02

Hi @akulbe,
What happens when you execute : pvesm status

Are you able to "ls"/access the /mnt/pve/VM-Linux and do you see the data there?

There is more involved in PVE/NFS relationship than port 2049.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

akulbe · Monday at 18:54

Code:

root@pve1:~# pvesm status
got timeout
unable to activate storage 'VM-Linux' - directory '/mnt/pve/VM-Linux' does not exist or is unreachable
got timeout
unable to activate storage 'VM-Windows' - directory '/mnt/pve/VM-Windows' does not exist or is unreachable
Name              Type     Status           Total            Used       Available        %
PBS                pbs     active     22561092608      2094993408     20466099200    9.29%
VM-Linux           nfs   inactive               0               0               0    0.00%
VM-Windows         nfs   inactive               0               0               0    0.00%
local              dir     active      1723664000       280648192      1443015808   16.28%
local-zfs      zfspool     active      1598644472       155628652      1443015820    9.74%

Code:

[  208.786718] nfs: server 172.20.252.1 not responding, still trying
[  232.338462] nfs: server 172.20.252.1 not responding, still trying
[  757.645020] nfs: server 172.20.252.1 OK
[  757.645107] nfs: server 172.20.252.1 OK
[  937.864829] nfs: server 172.20.252.1 not responding, still trying
[  964.488513] nfs: server 172.20.252.1 not responding, still trying
[ 1494.913873] nfs: server 172.20.252.1 not responding, still trying
[ 1765.247937] nfs: server 172.20.252.1 OK
[ 1765.247961] nfs: server 172.20.252.1 OK
[ 1765.248012] nfs: server 172.20.252.1 OK
[ 1945.468125] nfs: server 172.20.252.1 not responding, still trying
[ 1945.468129] nfs: server 172.20.252.1 not responding, still trying
[ 2756.467309] nfs: server 172.20.252.1 OK
[ 2756.467371] nfs: server 172.20.252.1 OK
[ 2936.687450] nfs: server 172.20.252.1 not responding, still trying
[ 2936.687450] nfs: server 172.20.252.1 not responding, still trying
[ 3719.014706] nfs: server 172.20.252.1 OK
[ 3719.014791] nfs: server 172.20.252.1 OK
[ 3719.014991] nfs: server 172.20.252.1 OK
[ 3899.235144] nfs: server 172.20.252.1 not responding, still trying
[ 4710.234251] nfs: server 172.20.252.1 OK
[ 4890.454639] nfs: server 172.20.252.1 not responding, still trying
[ 4890.454644] nfs: server 172.20.252.1 not responding, still trying

I can see /mnt/pve, and when I do an "ls" I see the dirs VM-Linux and VM-Windows, but when I try to "ls" the contents of either of those, the system just hangs.

bbgeek17 · Monday at 19:01

My first impression of the information you provided - you have a network issue. Perhaps an MTU mismatch.
Note that PVE health checks for NFS consists of RPC probing (showmount , rpcinfo). Those often use UDP.

My next step is to use those commands directly and troubleshoot any issues you find.

Code:

sub check_connection {
    my ($class, $storeid, $scfg) = @_;

    my $server = $scfg->{server};
    my $opts = $scfg->{options};
    
    my $cmd;

    my $is_v4 = defined($opts) && $opts =~ /vers=4.*/;
    if ($is_v4) {
        my $ip = PVE::JSONSchema::pve_verify_ip($server, 1);
        if (!defined($ip)) {
            $ip = PVE::Network::get_ip_from_hostname($server);
        }
    
        my $transport = PVE::JSONSchema::pve_verify_ipv4($ip, 1) ? 'tcp' : 'tcp6';

        # nfsv4 uses a pseudo-filesystem always beginning with /
        # no exports are listed
        $cmd = ['/usr/sbin/rpcinfo', '-T', $transport, $ip, 'nfs', '4'];
    } else {
        $cmd = ['/sbin/showmount', '--no-headers', '--exports', $server];
    }
    
    eval {
        run_command($cmd, timeout => 10, outfunc => sub { }, errfunc => sub { });
    };
    if (my $err = $@) {
        if ($is_v4) {
            my $port = 2049;
            $port = $1 if defined($opts) && $opts =~ /port=(\d+)/;

            # rpcinfo is expected to work when the port is 0 (see 'man 5 nfs') and tcp_ping()
            # defaults to port 7 when passing in 0.
            return 0 if $port == 0;

            return PVE::Network::tcp_ping($server, $port, 2);
        }
        return 0;
    }   
            
    return 1;   
}

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

akulbe · Monday at 19:16

I'm not sure what to do with your reply... is that Perl(?) that I should be putting in a file and running, to check?

akulbe · Monday at 19:22

MTU is 9000 on both ends, as far as I can tell.

bbgeek17 · Monday at 19:37

akulbe said:
I'm not sure what to do with your reply... is that Perl(?) that I should be putting in a file and running, to check?

It is Perl. It is what PVE does to health check NFS. You can look at two CMD= lines and run those commands manually to see if they fail. Then you can troubleshoot why they fail.

akulbe said:
MTU is 9000 on both ends, as far as I can tell.

You should try testing that it is actually working by doing a large non-fragmented ICMP ping.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

gfngfn256 · Monday at 19:43

akulbe said:
root@pve1

It looks like you are using a cluster in Proxmox, if so I believe you need to have NFS v3 also enabled in the NAS. See this thread.
(Possibly during you're update in TrueNAS SCALE you changed this?).

Please note: I don't use TrueNAS SCALE so I don't have personal experience with this.

Search

Search

Can no longer mount shared NFS storage from external device.

akulbe

Member

akulbe

Member

bbgeek17

Distinguished Member

akulbe

Member

bbgeek17

Distinguished Member

akulbe

Member

akulbe

Member

bbgeek17

Distinguished Member

gfngfn256

Distinguished Member

We value your privacy