[SOLVED] NFS connection drops suddennly

  • Thread starter Thread starter Deleted member 33567
  • Start date Start date
D

Deleted member 33567

Guest
Recently one of Proxmox servers in cluster lost connection to NFS server for backup at OVH side.
These are free with every server, and this is first time when out of nothing one server has connection lost / down.

Any idea what to look into for solving this? The same share works fine on other servers. Nothing changed in IP allocation so I see nothing just the console errors. Those don't say much

Code:
Nov 29 13:17:26 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:31 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:34 n03-sxb-pve01 pvestatd[1156999]: got timeout
Nov 29 13:17:34 n03-sxb-pve01 pvestatd[1156999]: unable to activate storage 'ovh-nfs-no03' - directory '/mnt/pve/ovh-nfs-no03' does not exist or is unreachable
Nov 29 13:17:36 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:41 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:44 n03-sxb-pve01 pvestatd[1156999]: got timeout
Nov 29 13:17:44 n03-sxb-pve01 pvestatd[1156999]: unable to activate storage 'ovh-nfs-no03' - directory '/mnt/pve/ovh-nfs-no03' does not exist or is unreachable
Nov 29 13:17:47 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:52 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:54 n03-sxb-pve01 pvestatd[1156999]: got timeout
Nov 29 13:17:54 n03-sxb-pve01 pvestatd[1156999]: unable to activate storage 'ovh-nfs-no03' - directory '/mnt/pve/ovh-nfs-no03' does not exist or is unreachable
Nov 29 13:18:00 n03-sxb-pve01 systemd[1]: Starting Proxmox VE replication runner...
Nov 29 13:18:00 n03-sxb-pve01 systemd[1]: pvesr.service: Succeeded.
Nov 29 13:18:00 n03-sxb-pve01 systemd[1]: Started Proxmox VE replication runner.
Nov 29 13:18:02 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:18:04 n03-sxb-pve01 pvestatd[1156999]: got timeout

And the next error shown in console would be to systemd time service:

Code:
Nov 29 13:18:42 n03-sxb-pve01 systemd[1]: systemd-timesyncd.service: Start operation timed out. Terminating.
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: systemd-timesyncd.service: Main process exited, code=killed, status=15/TERM
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: systemd-timesyncd.service: Failed with result 'timeout'.
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: Failed to start Network Time Synchronization
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: systemd-timesyncd.service: Service has no hold-off time (RestartSec=0), scheduling restart
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 32.
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: Stopped Network Time Synchronization.
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: Starting Network Time Synchronization...
 
Last edited by a moderator:
Recently one of Proxmox servers in cluster lost connection to NFS server for backup at OVH side.
These are free with every server, and this is first time when out of nothing one server has connection lost / down.

Any idea what to look into for solving this? The same share works fine on other servers. Nothing changed in IP allocation so I see nothing just the console errors. Those don't say much

Code:
Nov 29 13:17:26 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:31 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:34 n03-sxb-pve01 pvestatd[1156999]: got timeout
Nov 29 13:17:34 n03-sxb-pve01 pvestatd[1156999]: unable to activate storage 'ovh-nfs-no03' - directory '/mnt/pve/ovh-nfs-no03' does not exist or is unreachable
Nov 29 13:17:36 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:41 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:44 n03-sxb-pve01 pvestatd[1156999]: got timeout
Nov 29 13:17:44 n03-sxb-pve01 pvestatd[1156999]: unable to activate storage 'ovh-nfs-no03' - directory '/mnt/pve/ovh-nfs-no03' does not exist or is unreachable
Nov 29 13:17:47 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:52 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:17:54 n03-sxb-pve01 pvestatd[1156999]: got timeout
Nov 29 13:17:54 n03-sxb-pve01 pvestatd[1156999]: unable to activate storage 'ovh-nfs-no03' - directory '/mnt/pve/ovh-nfs-no03' does not exist or is unreachable
Nov 29 13:18:00 n03-sxb-pve01 systemd[1]: Starting Proxmox VE replication runner...
Nov 29 13:18:00 n03-sxb-pve01 systemd[1]: pvesr.service: Succeeded.
Nov 29 13:18:00 n03-sxb-pve01 systemd[1]: Started Proxmox VE replication runner.
Nov 29 13:18:02 n03-sxb-pve01 kernel: nfs: server ftpback-rbx4-118.ovh.net not responding, timed out
Nov 29 13:18:04 n03-sxb-pve01 pvestatd[1156999]: got timeout

And the next error shown in console would be to systemd time service:

Code:
Nov 29 13:18:42 n03-sxb-pve01 systemd[1]: systemd-timesyncd.service: Start operation timed out. Terminating.
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: systemd-timesyncd.service: Main process exited, code=killed, status=15/TERM
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: systemd-timesyncd.service: Failed with result 'timeout'.
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: Failed to start Network Time Synchronization
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: systemd-timesyncd.service: Service has no hold-off time (RestartSec=0), scheduling restart
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 32.
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: Stopped Network Time Synchronization.
Nov 29 13:18:43 n03-sxb-pve01 systemd[1]: Starting Network Time Synchronization...
* Check the IP connection (i.e. try pinging the NFS Server) from Proxmox host
* If it's working: try to add a "manual" share in addition to storage definition, i.e. run e.g.
Code:
mount.nfs ftpback-rbx4-118.ovh.net:/sharename /mnt/tempshare

If this works but the Proxmoxstorage still not there is a Proxmox specific problem which can be workarounded by using the "manual" share.
If this does not work either there is a problem at the server.
 
This seems it has been blocked due to inconsitency in kernel versions between 2 servers in the cluster.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!