Proxmox 7 kernel 5.15.35-2-pve does not support NFS over UDP mounts ?

Mar 29, 2021
4
0
6
47
The title says it all really, I upgraded the first host on our cluster yesterday and discovered that NFS mounts were not available.
The NFS server is happily humming along, but the mounts were marked as offline.
Trying to mount by hand, resulted in :
Code:
# mount -v -t nfs 172.16.10.1:/tank/stack_secondary /mnt/pve/skafiv3_secondary -o hard,vers=3,proto=udp
mount.nfs: timeout set for Tue Jun 14 13:39:37 2022
mount.nfs: trying text-based options 'hard,vers=3,proto=udp,mountproto=udp,addr=172.16.10.1'
mount.nfs: prog 100003, trying vers=3, prot=17
mount.nfs: trying 172.16.10.1 prog 100003 vers 3 prot UDP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 172.16.10.1 prog 100005 vers 3 prot UDP port 20048
mount.nfs: mount(2): Invalid argument
mount.nfs: an incorrect mount option was specified
Omitting the proto=udp option works as expected, albeit over TCP.
Rebooting the host with the latest Proxmox 6 kernel 5.4.178-1-pve also works, over UDP this time with everything else left intact.
So, it appears that sometime between those two kernel versions, NFS over UDP was removed.
Are there any plans to re-instate NFS over UDP ?
Is it deliberate and we have to rethink shared storage ? The NFS server is clustered with DRBD for the shared storage and UDP offers very fast failover times, while TCP requires some manual intervention when failover happens.
 
OK, things being that way, I guess we have the following options:
1. Change to NFS over TCP
2. Download the debian kernel sources and recompile without NFS_DISABLE_UDP_SUPPORT (which of course entails maintaining the kernel package locally and tracking changes)
3. Keep running a kernel version < 5.6 (ie 5.4.189-2-pve)
4. Something else ?

I'd guess that 1. is the most supported option from your side and probably the least desirable from ours.
2. sounds reasonable but entails some work from our part assuming you're OK with supporting that. The upstream commit seems trivial and its purpose appears to be to discourage users from shooting themselves in the foot.
3. Seems a very middle-of-the-road temporary solution but I'd like to know if there are any obvious risks with it.

So it would appear that we have to go with 2., unless of course you happen to have a kernel variant like that handy and willing to support that or there is some other solution I couldn't think of.
 
1. Change to NFS over TCP
man nfs:
Code:
TCP  is  the default transport protocol used for all modern NFS implementations.  It performs well in almost every conceivable network environment and provides excellent guarantees against data
       corruption caused by network unreliability.  TCP is often a requirement for mounting a server through a network firewall.
2. Download the debian kernel sources and recompile without NFS_DISABLE_UDP_SUPPORT (which of course entails maintaining the kernel package locally and tracking changes)
3. Keep running a kernel version < 5.6 (ie 5.4.189-2-pve)
4. Something else ?
I cant think of a good reason why your failover should be slower with TCP vs UDP except non-optimized configuration for your specific environment.
It should tell you something that UDP is being disabled by default by major vendors.

Have you experimented with NFS and TCP timeouts to see if it helps reduce your failover? If you are confident in your network and on-the-wire timeout is an indication that client should give up, then set it to do so.

I would experiment with :
Code:
timeo=n        The time in deciseconds (tenths of a second) the NFS client waits for a response before it retries an NFS request.

                      For NFS over TCP the default timeo value is 600 (60 seconds).  The NFS client performs linear backoff: After each retransmission the timeout is increased by timeo up to the maxi‐
                      mum of 600 seconds.

And also look at sysctl values.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
> I cant think of a good reason why your failover should be slower with TCP vs UDP except non-optimized configuration for your specific
> environment.

This describes the situation quite accurately and mirrors our own experience closely. It is not quite possible to transparently failover TCP connections.
The only time I've seen something like this working, was with ipvsadm and its connection synchronization daemon. Unfortunately, it does not quite fit the current topology.

> It should tell you something that UDP is being disabled by default by major vendors.

Yes, the problem is known and explained very well under ``Using NFS over UDP on high-speed links'' in the nfs(5) man page.
If one does not do fragmentation at all, utilizing a sufficiently small rsize and wsize to fit inside a single (jumbo) frame and the network is reliable, no trouble.

Anyhow, forking https://github.com/proxmox/pve-kernel shouldn't be too difficult, no ?

Thanks for trying to help
 
This describes the situation quite accurately and mirrors our own experience closely. It is not quite possible to transparently failover TCP connections.
Alright, so sounds like you are dealing with server side issues, rather than client. Every self-respecting NAS vendor (Netapp, Dell/EMC, others) have solved this problem 25 years ago.
Anyhow, forking https://github.com/proxmox/pve-kernel shouldn't be too difficult, no ?
It depends on your skillset, but should be doable. Make sure to document everything well, so that the guy who manages this environment after you doesnt accidentally upgrade to a stock kernel...


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
patching it should be trivial - it's just a KConfig switch (for now - support might obviously be dropped completely upstream at some point ;)).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!