NFS Share status unknown on 2 of 5 nodes

Mar 28, 2024
4
0
6
Hi, I am trying to track down an issue I am having. About a month ago or so 2 of my 5 nodes started to have an issue with the NFS share showing up as unknown. The NFS share is set up within the datacenter, and all 5 nodes have access. It’s a unraid filerserver with NFS on with the default values. It will not allow me to shut down or stop the lxc that has access. It will not reboot or shut down. I must manually restart the server. Once restarted it comes back online for a week or two. The other nodes work without issue. I am a novice. Can someone point me in the right direction


All 5 nodes are on Virtual Environment 9.1.6. All are the same equipment.

Code:
nfs: fileserver
export /mnt/user/Media
path /mnt/pve/fileserver
server 192.168.1.200
content images
prune-backups keep-all=1


Code:
May 12 20:35:32 Zeus pvestatd[2484083]: unable to activate storage 'fileserver' - directory '/mnt/pve/fileserver' does not exist or is unre>
May 12 20:35:43 Zeus pvestatd[2484083]: got timeout
May 12 20:35:43 Zeus pvestatd[2484083]: unable to activate storage 'fileserver' - directory '/mnt/pve/fileserver' does not exist or is unre>
May 12 20:35:52 Zeus pvestatd[2484083]: got timeout


Code:
pvesm status
got timeout
unable to activate storage 'fileserver' - directory '/mnt/pve/fileserver' does not exist or is unreachable
fileserver nfs inactive
 
Can you manually mount the NFS share on the problematic nodes when the issue is present?
Are you sure there is duplicate IP on the network at that time? Have you compared all package versions, including Kernel, across the nodes?
Without logs and additional debug, at the time the issue is present, it would be hard to hypothesize.

Carefully check journalctl for any errors at the approximate time of when the issue starts.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Can you manually mount the NFS share on the problematic nodes when the issue is present?
Are you sure there is duplicate IP on the network at that time? Have you compared all package versions, including Kernel, across the nodes?
Without logs and additional debug, at the time the issue is present, it would be hard to hypothesize.

Carefully check journalctl for any errors at the approximate time of when the issue starts.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I tried to manually mount it. I believe i did it correctly. It just sits there and hangs up.

i am confident there are no duplicate ip address on the network.

i quickly compared between the one working and one that isnt. it looks the same. i did the updates at the same time. there are a few updates maybe ill do those to see if it fixes.

when i try pvesm status and the ip address it does show the shared folders.

the problem i am not sure when its happening. Its random and i dont think i catch it quickly. ill try to keep a better eye see if i can catch it close to when the issue happen.

i have thought about just wiping the system and setting it up again if i cant trace the issue.
 
I tried to manually mount it. I believe i did it correctly. It just sits there and hangs up.
PVE uses native Linux means of connecting NFS, i.e. Ubuntu-based Kernel and Debian-based userland. These are used in millions of nodes world-wide.

There are many possibilities here, for example,a bad NIC/Cable that is affected by solar flares, a firmware leak, NAS issue, etc.

You need to methodically work through basic Linux NFS troubleshooting at the time of the problem. Keep good records of what you did and the results. If you don't come to any conclusions - post the output here.

Given the timeout errors - a network trace may be illuminating.

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox