Hello,
I've finally been able to get my hands on a few servers to test Proxmox for potential future deployments. Currently working with two HPE DL360 Gen10's, and a Gen11 is going to be available in a week. So far, on the Gen10's, not seeing any hardware issues (Which is great). Using the P408-i raid controller with LVM-thin locally, and looking at NFS for shared storage.
I'd like to use NFS as it can handle a less stable storage network, meaning that we can deploy using standard L2/L3 switches. I do however like to use redundant network paths (Two switches) as a single switch could take the cluster down when there is an issue, or during maintenance. In order to do this 'properly' it would require LACP to two switches, and having MLAG in between the switches. That way a switch may fully fail and things will keep working.
However the MLAG requirement makes that the 'deploy using standard L2/L3 switches' is no longer feasible. So I would need to do NFS session trunking: Have two separate IP networks between PVE's and NAS/SAN, which allows one path (=switch) to fail and still keep the storage online. I'm aware that this requires a manual mount and use PVE's "directory" storage method.
My issue is with getting session trunking to work. Was initially testing with NFS4.1 on a Synology, but replaced that with a Debian 12 server that supports 4.2 exports. Proxmox being Debian based this should be the perfect combination, right?
Tried various things. NAS has IP's 10.200.0.200/24 and 10.202.0.200/24, while the PVE host has 10.200.0.101 and 10.202.0.101.
Double mount, which was recommended using TCP:
Using trunkdiscovery:
Having the trunkdiscovery option included on the second mount gives the resource-busy error. However, omitting that option on the second mount gives ...some result. We now have the share mounted via two networks. Netstat also shows these as established TCP connections. However note that the effective addr flag/option is overridden to the 202 address for both. Taking down the 202 network does not cause a switchover to the 200 network, meaning that the redundancy concept does not work. The mount just hangs until connectivity is restored. So this doesn't seem to be the way either.
Searching about this topic seems to return very little useful information. There are various articles by NAS vendors (e.g. NETAPP) that explain how their client does this. I've also tried VMWare in a desperate attempt and that does seem to be working as well (At least the mount is being created, and one path can be down without impacting the service). I'm at the point that I'm tempted to start making network captures to try to understand what's happening on the lower level.. but I'd rather not.
Does anyone have any recent experience with Session trunking for increased redundancy (And potentially higher transfer speeds).
Thanks!
I've finally been able to get my hands on a few servers to test Proxmox for potential future deployments. Currently working with two HPE DL360 Gen10's, and a Gen11 is going to be available in a week. So far, on the Gen10's, not seeing any hardware issues (Which is great). Using the P408-i raid controller with LVM-thin locally, and looking at NFS for shared storage.
I'd like to use NFS as it can handle a less stable storage network, meaning that we can deploy using standard L2/L3 switches. I do however like to use redundant network paths (Two switches) as a single switch could take the cluster down when there is an issue, or during maintenance. In order to do this 'properly' it would require LACP to two switches, and having MLAG in between the switches. That way a switch may fully fail and things will keep working.
However the MLAG requirement makes that the 'deploy using standard L2/L3 switches' is no longer feasible. So I would need to do NFS session trunking: Have two separate IP networks between PVE's and NAS/SAN, which allows one path (=switch) to fail and still keep the storage online. I'm aware that this requires a manual mount and use PVE's "directory" storage method.
My issue is with getting session trunking to work. Was initially testing with NFS4.1 on a Synology, but replaced that with a Debian 12 server that supports 4.2 exports. Proxmox being Debian based this should be the perfect combination, right?
Tried various things. NAS has IP's 10.200.0.200/24 and 10.202.0.200/24, while the PVE host has 10.200.0.101 and 10.202.0.101.
Double mount, which was recommended using TCP:
Bash:
# mount -v -t nfs4 10.200.0.200:/var/nfs /mnt/mpio_test -o "nfsvers=4,minorversion=2,hard,proto=tcp,timeo=50,retrans=1,sec=sys,clientaddr=0.0.0.0,max_connect=8"
== OK
# mount -v -t nfs4 10.202.0.200:/var/nfs /mnt/mpio_test -o "nfsvers=4,minorversion=2,hard,proto=tcp,timeo=50,retrans=1,sec=sys,clientaddr=0.0.0.0,max_connect=8"
mount.nfs4: mount(2): Device or resource busy
Using trunkdiscovery:
Bash:
$ mount -v -v -v -t nfs4 10.202.0.200:/volume1/Proxmox_mpio /mnt/mpio_test -o "nfsvers=4,minorversion=1,hard,proto=tcp,timeo=50,retrans=1,sec=sys,clientaddr=0.0.0.0,nconnect=2,max_connect=8,trunkdiscovery"
==OK
# mount -v -v -v -t nfs4 10.200.0.200:/volume1/Proxmox_mpio /mnt/mpio_test -o "nfsvers=4,minorversion=1,hard,proto=tcp,timeo=50,retrans=1,sec=sys,clientaddr=0.0.0.0,nconnect=2,max_connect=8"
==OK
# nfsstat -m
/mnt/mpio_test from 10.202.0.200:/volume1/Proxmox_mpio
Flags: rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,nconnect=2,timeo=50,retrans=1,sec=sys,clientaddr=0.0.0.0,local_lock=none,addr=10.202.0.200
/mnt/mpio_test from 10.200.0.200:/volume1/Proxmox_mpio
Flags: rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,nconnect=2,timeo=50,retrans=1,sec=sys,clientaddr=0.0.0.0,local_lock=none,addr=10.202.0.200
# netstat -an | grep 2049
tcp 0 180 10.202.0.101:823 10.202.0.200:2049 ESTABLISHED
tcp 0 0 10.200.0.101:671 10.200.0.200:2049 ESTABLISHED
tcp 0 124 10.202.0.101:946 10.202.0.200:2049 ESTABLISHED
Searching about this topic seems to return very little useful information. There are various articles by NAS vendors (e.g. NETAPP) that explain how their client does this. I've also tried VMWare in a desperate attempt and that does seem to be working as well (At least the mount is being created, and one path can be down without impacting the service). I'm at the point that I'm tempted to start making network captures to try to understand what's happening on the lower level.. but I'd rather not.
Does anyone have any recent experience with Session trunking for increased redundancy (And potentially higher transfer speeds).
Thanks!
Last edited: