NFS Storage and DNS

deflin

New Member
Dec 17, 2017
8
1
3
44
I have an issue where my NFS storage pool stops working when the primary DNS server goes offline. I would expect that the next DNS entry in /etc/resolve.conf would be used but this apparently isn't happening.

Am I missing something fundamental here? If not then there appears to be a DNS resolve bug somewhere in the storage stack.

Setup on nodes

Code:
PrimaryDNS:     192.168.1.20
SecondaryDNS:    192.168.1.21
NFS Host:        192.168.1.10
Proxmox nodes:   192.168.1.1[2-4]

/etc/resolv.conf
Code:
root@vmhost04:~# cat /etc/resolv.conf
nameserver 192.168.1.20
nameserver 192.168.1.21
nameserver 8.8.8.8
search <my-domain>

/etc/pve/storage.cfg
Code:
root@vmhost04:~# cat /etc/pve/storage.cfg
...
nfs: pvestorage
        export /srv/data/pvestorage
        path /mnt/pve/pvestorage
        server vmhost01
        content iso,backup,images,vztmpl,rootdir
        maxfiles 1
        options vers=3

pvesm status
Code:
root@vmhost04:~# pvesm status
Name              Type     Status           Total            Used       Available        %
local              dir     active        20511312         1836172        17610180    8.95%
local-lvm      lvmthin     active       448278528        39314026       408964501    8.77%
pvestorage         nfs     active       154687488        24112128       124037120   15.59%

mount
Code:
root@vmhost04:~# mount
...
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=3281408k,mode=700)
vmhost01:/srv/data/pvestorage on /mnt/pve/pvestorage type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.10,mountvers=3,mountport=36981,mountproto=udp,local_lock=none,addr=192.168.1.10)


Bring Down Primary DNS (192.168.1.20)

When the primary dns is brought down then pvestorage goes offline and never comes back until primary dns is brought back online.

Code:
root@vmhost04:~# pvesm status
storage 'pvestorage' is not online
Name              Type     Status           Total            Used       Available        %
local              dir     active        20511312         1836304        17610048    8.95%
local-lvm      lvmthin     active       448278528        39314026       408964501    8.77%
pvestorage         nfs   inactive               0               0               0    0.00%

Primary dns is no longer responding. The secondary dns is responding using dig, both by specifying the target dns server and letting dig resolve the proper server.

Code:
root@vmhost04:~# dig +search +noall +answer vmhost01
vmhost01.<domain>. 3600 IN    A       192.168.1.10
root@vmhost04:~# dig +search +noall +answer @192.168.1.20 vmhost01
;; connection timed out; no servers could be reached
root@vmhost04:~# dig +search +noall +answer @192.168.1.21 vmhost01
vmhost01.<domain>. 3600 IN    A       192.168.1.10
 
In Linux there is a default for how dns is checked should the primary not resolve. And by default it’s quite poor. You can customise it with resolv.conf entries.. however in your case I would be inclined just to add your NFS serve to the /etc/hosts file. That is checked first and is then independent of DNS server resolution. In fact that’s how I have mine setup. I’ve put all my nodes and NFS into the hosts file on each node.
 
/etc/hosts is definitely one workaround.

I'm still interested in finding the root cause of the resolve issue or whether any others see this behavior.
 
  • Like
Reactions: elmacus