Seeing as how neither NFS session trunking or pNFS is supported in Proxmox, I was faced with an interesting question as to how to provide multiple parallel links to my NFS storage for both load distribution and path redundancy.
From experience, switch level port aggregation with or without LACP works okay in failover scenarios with NFS but is not ideal with load distribution, especially when using multiple switches as uplinks for redundancy purposes.
Being a network guy by trade, I set aside my "server glasses" and put on my "network glasses" and I looked at the issue a little differently. I came up with a solution that has to-date worked very well for me, so I thought I would share.
OSPF will, by default, load balance traffic on a per-flow basis across up to 4 links - more if you reconfigure it. So this is exactly what I did. I installed Quagga on my four Proxmox hosts and on my NFS host. I created a loopback adapter on my NFS host and advertised that into OSPF. The NFS server and four Proxmox hosts each have 4 NICs within the storage network, each on a different physical switch and each in a different "throw-away" subnet. I mount the exported NFS share in Proxmox by the NFS server loopback IP address learned via OSPF on the Proxmox hosts. I have enabled OSPF fast-hellos in Quagga so that sub-second convergence happens in case of a link failure.
Link failure detection and new data path selection happens within a couple hundred milliseconds, and I have tried everything I can think of to break it but have failed to do so every time. For me this has worked out very well, so I thought I would share. I run about 20 VMs across four hosts with this setup, but of course YMMV.
From experience, switch level port aggregation with or without LACP works okay in failover scenarios with NFS but is not ideal with load distribution, especially when using multiple switches as uplinks for redundancy purposes.
Being a network guy by trade, I set aside my "server glasses" and put on my "network glasses" and I looked at the issue a little differently. I came up with a solution that has to-date worked very well for me, so I thought I would share.
OSPF will, by default, load balance traffic on a per-flow basis across up to 4 links - more if you reconfigure it. So this is exactly what I did. I installed Quagga on my four Proxmox hosts and on my NFS host. I created a loopback adapter on my NFS host and advertised that into OSPF. The NFS server and four Proxmox hosts each have 4 NICs within the storage network, each on a different physical switch and each in a different "throw-away" subnet. I mount the exported NFS share in Proxmox by the NFS server loopback IP address learned via OSPF on the Proxmox hosts. I have enabled OSPF fast-hellos in Quagga so that sub-second convergence happens in case of a link failure.
Link failure detection and new data path selection happens within a couple hundred milliseconds, and I have tried everything I can think of to break it but have failed to do so every time. For me this has worked out very well, so I thought I would share. I run about 20 VMs across four hosts with this setup, but of course YMMV.