[BUG] Backup NFS umount not working

Oct 2, 2018
33
1
11
34
Hi,
we noticed this strange issue happening on our Proxmox nodes.

On 30/08 we added a new node to our Proxmox cluster (3 nodes before, 4 nodes after); this is the load average graph of the cluster since that date. As you can see, after that date, load has started to increase, constantly day by day. System load was increasing only on the 3 "old" nodes, not on the new one (you can see that last one node as the clearer green part on the bottom of the graph). We noticed all that situation only today, because it is not causing any particular trouble.

Schermata da 2019-09-19 18-09-36.png

Today one trigger on our monitoring system alerted us that number of processes on the first 3 nodes was over the threshold. This is the graph of the processes: today the number of processes was over 600 per node. You can clearly see node number 4 as not affected in the bottom part of the graph. For some reason, at a precise time of the day, 10 processes / day are added.

Schermata da 2019-09-19 18-09-43.png

I also discovered that this process was created every morning on the first 3 nodes:

nobody 3193922 3193921 0 06:25 ? 00:00:00 /usr/bin/find / -ignore_readdir_race ( -fstype NFS -o -fstype nfs -o -fstype nfs4 -o -fstype afs -o -fstype binfmt_misc -o -fstype proc -o -fstype smbfs -o -fstype autofs -o -fstype iso9660 -o -fstype ncpfs -o -fstype coda -o -fstype devpts -o -fstype ftpfs -o -fstype devfs -o -fstype mfs -o -fstype shfs -o -fstype sysfs -o -fstype cifs -o -fstype lustre_lite -o -fstype tmpfs -o -fstype usbfs -o -fstype udf -o -fstype ocfs2 -o -type d -regex \(^/tmp$\)\|\(^/usr/tmp$\)\|\(^/var/tmp$\)\|\(^/afs$\)\|\(^/amd$\)\|\(^/alex$\)\|\(^/var/spool$\)\|\(^/sfs$\)\|\(^/media$\)\|\(^/var/lib/schroot/mount$\) ) -prune -o -print0

Something was suggesting me to check fs mounts... and i discovered this:
10.50.0.160:/var/nfs/general on /mnt/pve/nfs_backup type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.50.0.160,mountvers=3,mountport=55917,mountproto=udp,local_lock=none,addr=10.50.0.160)

This mount is an old storage backup that we removed on 30/08, because we decomissioned the physical machine that hosted NFS server (before decommisioning and shutdown NFS service on the old storage backup we removed the storage from GUI).

As I said, we made this operation from Proxmox GUI and I would expect that Proxmox would have unmounted the NFS folder, I was wrong. In fact the mount is still there and every morning something tries to access this folder to perform a backup task that was already removed, causing the number of processes (and consequently load to increase). Process are never killed for some reason, they remain there hanging on.

Also trying to manually unmount the folder is not working... At this point we will try a reboot during the next maintenance window. Node 4 is not affected because it was rebooted multiple times after having added it to the cluster, and having removed NFS storage from cluster.

I think that a reboot will solve this situation, but I would like to report this: maybe someone alse can be in the same situation.
 
This is typical NFS behavior if the volume was hard-mounted. See the NFS man page for the hard/soft mount options. For some use-cases the use of automount or autofs will mitigate this by only mounting the volume when it is being used.
 
Lazy unmount failed, command was totally unresponsive.
At the end rebooting the nodes solved the situation.

I think that this issue must be addressed by Proxmox team, consequences in terms of load and opened files could be serious in a similar situation if not noticed.
 
Bummer. :/

File a bug through their actual bug reporter. That way more of their team will see it, and it can be better tracked.



Lazy unmount failed, command was totally unresponsive.
At the end rebooting the nodes solved the situation.

I think that this issue must be addressed by Proxmox team, consequences in terms of load and opened files could be serious in a similar situation if not noticed.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!