NFS share didn't remount automatically

ewuewu

Renowned Member
Sep 14, 2010
61
0
71
Hamburg
Hello all,
we are running a three node cluster (Proxmox Vers. 3.1-3). The VM images are stored on two shares provided by a failover storage server (Ubuntu 12.04, drbd, Corosync/Pacemaker).

Tonight while backup was running the cluster lost the connection to the storage for a very short moment. (As expected the backup stopped with an error – so far no problem)

All cluster members were able to reconnect share1 but unfortunately not share2.

In the morning when I discovered this unhappy state, I tried to connect to share2 on the storage servers from another Linux box for testing. This worked immediately and I could access everything on share2 form this box. Afterwards I unmounted this share.

I checked the logs on the storage servers but couldn’t find any messages that indicates unexpected behavior.
But all three proxmox cluster members were reporting
pvestatd[4397]: WARNING: unable to activate storage 'san2-VMs-nfs' - directory '/mnt/pve/san2-VMs-nfs' does not exist

Afterwards I tried to unmount share2 on all cluster members manually from commandline hoping that the hosts will reconnect automatically after a short while. Unmounting worked without problems on two of the hosts. The third note denied this with a ‘share busy’ message. The share was locked by two vms. After stopping these vms I was able to unmount this share also.

Unfortunately the hosts didn’t reconnect automatically. After I've changed the share from the Proxmox GUI (allowing storing ISOs also) the hosts were reestablishing the connection. – Later I changed this setting back to store images only on this share.
Afterwards everything was running as before.

All shares are connected the same way:
192.168.33.50:/nfs-share-lun1 on /mnt/pve/san1-VMs-nfs type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.33.50,mountvers=3,mountport=54612,mountproto=udp,local_lock=none,addr=192.168.33.50)
192.168.33.51:/nfs-share-lun2 on /mnt/pve/san2-VMs-nfs type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.33.51,mountvers=3,mountport=54612,mountproto=udp,local_lock=none,addr=192.168.33.51)

Has anyone an idea why proxmox wasn’t able to reconnect automatically to share1 but not to share2?

Are there possibly any additional options for nfs shares that make reconnect a share more reliable?

Any hint is appreciated.
 
Last edited:
NFS will by default lock nfs connections indefinitely so if a connection is broken to a nfs share you will need to restart all nfs related services on the storage node and then kill staled nfs sessions on the client side. lsof /mnt/pve/nfs-node will show staled sessions. After you have terminated these session all nfs related services needs to be restarted.
 
Hello mir,

thanks for your quick reply.

Are you really sure? In case that you are right a nfs based failover storage woudln't make sense cause on each failover the connections will be dropped for a short while.

But there are a lot of vendors offering such solutions.

And in my case I described above - why has share1 reconnected but not share2?
 
NFS will by default lock nfs connections indefinitely so if a connection is broken to a nfs share you will need to restart all nfs related services on the storage node and then kill staled nfs sessions on the client side. lsof /mnt/pve/nfs-node will show staled sessions. After you have terminated these session all nfs related services needs to be restarted.

Hello I know this is an old topic, but was this changed in the newer versions of Proxmox ? I have the simulare situation where my Remote node (accessed over VPN) loosing connection to our internal network (where the NFS server lives) and it becomes locked out until I reboot that node. can these settings be modified so that if there is any connection lost to the NFS share, Node DO NOT get locked out ? Or at the very least is there a service I might be able to restart on the node as opposed to restarting the node entirely ? Thanks in advanced
 
NFS version? I have NFS4.x with PVE4.4 and it work's with some delay in minutes (if share is offline longer time), after share is online again (probably timers in systemd mount units?).
 
Thanks for your reply,

NFS version? I have NFS4.x with PVE4.4 and it work's with some delay in minutes (if share is offline longer time), after share is online again (probably timers in systemd mount units?).

I'd have to look into that, I'm using a Qnap TVS-671 device that's serving up the NFS share, I'd have to check what version it's useing, also from the client side (Proxmox node) when adding the NFS share, should I just be using the Web GUI to add the NFS share to data center? In terms of NFS4 do I need to change anything in /etc/pve/storage.cfg ?
 
Thanks for your reply,
I'd have to look into that, I'm using a Qnap TVS-671 device that's serving up the NFS share, I'd have to check what version it's useing, also from the client side (Proxmox node) when adding the NFS share, should I just be using the Web GUI to add the NFS share to data center? In terms of NFS4 do I need to change anything in /etc/pve/storage.cfg ?

I don't know, how proxmox use systemd to mount NFS, but there is no way to set options in webgui yet, so i modified storage.cfg directly for setting NFS, added line "options=vers=4.0,..".
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!