[BUG] Backup NFS umount not working

Samuele Bistoletti · Sep 19, 2019

Hi,
we noticed this strange issue happening on our Proxmox nodes.

On 30/08 we added a new node to our Proxmox cluster (3 nodes before, 4 nodes after); this is the load average graph of the cluster since that date. As you can see, after that date, load has started to increase, constantly day by day. System load was increasing only on the 3 "old" nodes, not on the new one (you can see that last one node as the clearer green part on the bottom of the graph). We noticed all that situation only today, because it is not causing any particular trouble.

Today one trigger on our monitoring system alerted us that number of processes on the first 3 nodes was over the threshold. This is the graph of the processes: today the number of processes was over 600 per node. You can clearly see node number 4 as not affected in the bottom part of the graph. For some reason, at a precise time of the day, 10 processes / day are added.

I also discovered that this process was created every morning on the first 3 nodes:

nobody 3193922 3193921 0 06:25 ? 00:00:00 /usr/bin/find / -ignore_readdir_race ( -fstype NFS -o -fstype nfs -o -fstype nfs4 -o -fstype afs -o -fstype binfmt_misc -o -fstype proc -o -fstype smbfs -o -fstype autofs -o -fstype iso9660 -o -fstype ncpfs -o -fstype coda -o -fstype devpts -o -fstype ftpfs -o -fstype devfs -o -fstype mfs -o -fstype shfs -o -fstype sysfs -o -fstype cifs -o -fstype lustre_lite -o -fstype tmpfs -o -fstype usbfs -o -fstype udf -o -fstype ocfs2 -o -type d -regex $^/tmp$$\|$^/usr/tmp$$\|$^/var/tmp$$\|$^/afs$$\|$^/amd$$\|$^/alex$$\|$^/var/spool$$\|$^/sfs$$\|$^/media$$\|$^/var/lib/schroot/mount$$ ) -prune -o -print0

Something was suggesting me to check fs mounts... and i discovered this:
10.50.0.160:/var/nfs/general on /mnt/pve/nfs_backup type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.50.0.160,mountvers=3,mountport=55917,mountproto=udp,local_lock=none,addr=10.50.0.160)

This mount is an old storage backup that we removed on 30/08, because we decomissioned the physical machine that hosted NFS server (before decommisioning and shutdown NFS service on the old storage backup we removed the storage from GUI).

As I said, we made this operation from Proxmox GUI and I would expect that Proxmox would have unmounted the NFS folder, I was wrong. In fact the mount is still there and every morning something tries to access this folder to perform a backup task that was already removed, causing the number of processes (and consequently load to increase). Process are never killed for some reason, they remain there hanging on.

Also trying to manually unmount the folder is not working... At this point we will try a reboot during the next maintenance window. Node 4 is not affected because it was rebooted multiple times after having added it to the cluster, and having removed NFS storage from cluster.

I think that a reboot will solve this situation, but I would like to report this: maybe someone alse can be in the same situation.

BobhWasatch · Sep 19, 2019

This is typical NFS behavior if the volume was hard-mounted. See the NFS man page for the hard/soft mount options. For some use-cases the use of automount or autofs will mitigate this by only mounting the volume when it is being used.

Samuele Bistoletti · Sep 19, 2019

The storage was originally added trough the Proxmox GUI, I don't know what parameters Proxmox uses behind the GUI.

joshin · Sep 20, 2019

"Lazy" umounting the filesystem usually does the trick.

Try an "umount -l <path>" from the command line. (That's a lower-case L.)

Samuele Bistoletti · Sep 23, 2019

Lazy unmount failed, command was totally unresponsive.
At the end rebooting the nodes solved the situation.

I think that this issue must be addressed by Proxmox team, consequences in terms of load and opened files could be serious in a similar situation if not noticed.

joshin · Sep 23, 2019

Bummer. :/

File a bug through their actual bug reporter. That way more of their team will see it, and it can be better tracked.

Samuele Bistoletti said:
Lazy unmount failed, command was totally unresponsive.
At the end rebooting the nodes solved the situation.

I think that this issue must be addressed by Proxmox team, consequences in terms of load and opened files could be serious in a similar situation if not noticed.

Search

Search

[BUG] Backup NFS umount not working

Samuele Bistoletti

Member

BobhWasatch

Famous Member

Samuele Bistoletti

Member

joshin

Renowned Member

Samuele Bistoletti

Member

joshin

Renowned Member

We value your privacy