NFS storage is not online (500)

borahshadow

Renowned Member
Feb 29, 2016
2
0
66
32
Hi,

I have a 2 node Proxmox VE cluster running 4.1 I have multiple storage targets pointing to my NAS running FreeNAS and serving them via NFS.

I keep running into issues where the web interface will report Storage 'NAME' is not online (500). I know that my NFS storage in accessible though. For starters the running VMs which are stored on the share are still responsive. Secondly I've tried mounting the same NFS share from another box on the network and I can access it just fine.

I've searched and searched for solutions to this and I haven't been able to find anything that helps. The only thread I could find was https://forum.proxmox.com/threads/nfs-shares-storage-is-not-online-500.11121/ which didn't help much, not to mention that the versions of Proxmox VE discussed are all outdated now.

Currently rpcinfo -p 10.10.40.105 produces the following output. I want to say (but can't remember for certain) that the last time I dug into this issue I was getting timeouts for rpcinfo. But again I can't fully trust my memory on that point. All I know is what it's doing right now.


program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100000 4 7 111 portmapper
100000 3 7 111 portmapper
100000 2 7 111 portmapper
100005 1 udp 935 mountd
100005 3 udp 935 mountd
100005 1 tcp 935 mountd
100005 3 tcp 935 mountd
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100024 1 udp 875 status
100024 1 tcp 875 status
100021 0 udp 897 nlockmgr
100021 0 tcp 667 nlockmgr
100021 1 udp 897 nlockmgr
100021 1 tcp 667 nlockmgr
100021 3 udp 897 nlockmgr
100021 3 tcp 667 nlockmgr
100021 4 udp 897 nlockmgr
100021 4 tcp 667 nlockmgr

I should also mention that the problem comes and goes. I have a nightly backup routine and some nights the backup completes without error, Other nights only one or two VMs will fail to backup due to a storage offline error. Other nights every single VM will fail to backup. Using the webinterface seems to produce the error most frequently.

Your help is appreciated
 
Hi,
I also have a problem with an NFS share.
I have a cluster of two nodes to which is attached a NAS with NFS share and has always been running smoothly; last week I added a blade, the same version of Proxmox (3.4-11) and also configured the NFS storage.
The problem is that the new blade does not see the added NFS storage and reporting the error "storage 'QNAP-NFS' is not online (500)".

Here is some information about the configuration:
Code:
root@proxmox03:~# pveversion -v

proxmox-ve-2.6.32: not correctly installed (running kernel: 2.6.32-32-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-2.6.32-32-pve: 2.6.32-136
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-5
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-34
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Storage.cfg
Code:
root@proxmox03:~# cat /etc/pve/storage.cfg

dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0

lvm: vd1-raid5
        vgname vd1-raid5
        content images
        shared

lvm: vd2-raid5
        vgname vd2-raid5
        content images
        shared

nfs: QNAP-NFS
        path /mnt/pve/QNAP-NFS
        server 192.168.1.238
        export /backup-nfs
        options vers=3
        content images,iso,vztmpl,rootdir,backup
        maxfiles 3

rpcinfo
Code:
root@proxmox03:~# rpcinfo -p 192.168.1.238
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100011    1   udp  30002  rquotad
    100011    2   udp  30002  rquotad
    100011    1   tcp  30002  rquotad
    100011    2   tcp  30002  rquotad
    100005    1   udp  30000  mountd
    100005    1   tcp  30000  mountd
    100005    2   udp  30000  mountd
    100005    2   tcp  30000  mountd
    100005    3   udp  30000  mountd
    100005    3   tcp  30000  mountd
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100227    2   tcp   2049
    100227    3   tcp   2049
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100227    2   udp   2049
    100227    3   udp   2049
    100021    1   udp  40500  nlockmgr
    100021    3   udp  40500  nlockmgr
    100021    4   udp  40500  nlockmgr
    100021    1   tcp  41459  nlockmgr
    100021    3   tcp  41459  nlockmgr
    100021    4   tcp  41459  nlockmgr
    100024    1   udp  30001  status
    100024    1   tcp  30001  status

Any idea?

Thanks,
Lorenzo
 
I am also having the same issue.

Proxmox Version 4.1 in a three node cluster.

Mounting against FreeNAS-9.10

The NFS mounts don't actually stay offline (im not even sure they ever go offline) as i can traverse the mount form the cli but on the web gui reports its offline.

Also VM's mounted on the share and from other nodes continue to work.

In syslog there are errors regularly.

root@pve3:/etc# rpcinfo -p 10.x.x.20
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100000 4 7 111 portmapper
100000 3 7 111 portmapper
100000 2 7 111 portmapper
100005 1 udp 990 mountd
100005 3 udp 990 mountd
100005 1 tcp 990 mountd
100005 3 tcp 990 mountd
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100024 1 udp 634 status
100024 1 tcp 634 status
100021 0 udp 622 nlockmgr
100021 0 tcp 679 nlockmgr
100021 1 udp 622 nlockmgr
100021 1 tcp 679 nlockmgr
100021 3 udp 622 nlockmgr
100021 3 tcp 679 nlockmgr
100021 4 udp 622 nlockmgr
100021 4 tcp 679 nlockmgr

storage.cfg
nfs: pveVolume1
path /mnt/pve/pveVolume1
server 10.x.x.20
export /mnt/pveVolume1
content images,rootdir,backup,iso,vztmpl
maxfiles 10
options vers=3

Another symptom which i would suppose is all related is once the system has the errors the wait states go through the roof and the machine needs to be rebooted, but as i mentioned i can actually browse and write to the share form the cli when all this is going on ?

Any help would be greatly appreciated.
 
Hello,

I have the exact same problem with similar rpcinfo output. Proxmox says storage is not online but I can browse the share on the command line in /mnt/pve/mount.point. The problem is on/off, it can work for a week and then one day it says storage not online...

My storage is also hosted by FreeNAS 9.10.
 
PVE uses the following to check if the NFS server is online:
/sbin/showmount --no-headers --exports <my_server>

is this command properly returning in less than two seconds when you server is marked as offline ?
 
PVE uses the following to check if the NFS server is online:
/sbin/showmount --no-headers --exports <my_server>

is this command properly returning in less than two seconds when you server is marked as offline ?
It's been a while since I've dug into this issue since I originally reported it. It's still present for me but I've just worked around it when necessary.

It's still giving me the "Storage <name of storage> is not online 500" error. I ran the command that you suggested and it returned a list of exports on my NFS server (FreeNAS) the VM_Storage export is the one that is configured in the Proxmox configuration. The other two exports are mounted directly on a couple of VMs and not mounted inside of Proxmox.
Code:
root@host:~# /sbin/showmount --no-headers --exports <IP Address of NFS server>
/mnt/path/data   <IP Address restriction>
/mnt/path/VM_Storage (everyone)
/mnt/path/media      (everyone)

I also checked that RPC bind is accessible on the server by running nmap -p 111 -sU <address of NFS server> and it returned

Code:
 nmap -p 111 -sU <IP Address>

Starting Nmap 6.47 ( ) at 2017-05-24 10:30 MDT
Nmap scan report for <IP Address>
Host is up (0.00023s latency).
PORT    STATE SERVICE
111/udp open  rpcbind
MAC Address: 00:25:90:49:E0:2D (Super Micro Computer)

Nmap done: 1 IP address (1 host up) scanned in 13.55 seconds
 
Hello guys,
Is this issue has been solved?
I have same problem on my second node whereas the other node is working perfectly fine!
Can somebody help?
 
I eventually ditched FreeNAS over inconsistent ZFS send/receive issues and decided to use Proxmox as my storage server. I use ZFS backed storage with zraid1 or zraid2, which then allows me to use the machine as both a place to store snapshots as well as export NFS mounts and SMB shares.

I keep all the machines at the same version and patch level. So far no issues.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!