Redundant network links to storage

liepumartins

New Member
Jul 17, 2023
2
0
1
Proxmox cluster consisting of 3+ nodes and TrueNAS storage. NFS share for VM disks and CT volumes.
Dedicated network for storage access. Connected via 10G fiber switch and interfaces.

We have experienced a failure when link between node and storage was down. All the linux guests continued to work, but root filesystems became read-only. Windows guests became weird also. Only option was to reboot each guest.

If we would connect all nodes and TrueNAS to the storage network also via secondary interface, could it compensate for the primary fiber link going down?

Question is about the network path/links, we know that TrueNAS still remains the single point of failure.

Or is the idea wrong and this should be solved differently?
 
You should have a setup with two switches with mlag. This should solve your problem.
The next SPOF is your nic. Maybe you do LACP with 2 cards - so two seperate nics per server. Use one port for storage traffic and one for payload traffic on each card.

The TrueNAS is no enterprise grade storage and as i know does not have HA. You can try to eleminate all SPOFs but your storage is still your biggest problem.
 
  • Like
Reactions: liepumartins
As @hec mentioned there are a few options for network redundancy:
- dual port nic or dual nic with single switch, using LACP, Linux Bond, or multipath (iSCSI and NFSv4)
- dual port nic or dual nic with two independent switches, using Linux Bond, or multipath (iSCSI and NFSv4)
- dual port nic or dual nic with two inter-connected switches (either MLAG or Virtual Chassis), using LACP, Linux Bond, or multipath (iSCSI and NFSv4)

This would apply to both client and server sides.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: liepumartins
Thanks!

For clarity NICs are separate, that is why we wanted 10G to be primary and 1G as failover.
Did not know that pNFS is a thing!

Did a little more research.
Looks like TrueNAS SCALE or other ZFS + GlusterFS setup should be beneficial for such a small operation like ours. CephFS as I understand would be slow/slower unless a lot of hardware is purposed for it.
GlusterFS should eliminate network link as well as storage SPOFs. And we could avoid configuring MLAG or other bonding.

Correct me if I am wrong.
 
For clarity NICs are separate, that is why we wanted 10G to be primary and 1G as failover.
If you have dis-similar NICs, then you should not use LACP.
You may be able to use Active/Standby Linux bond, may be multipath with special configuration. Or, even better, get another 10G NIC... Keep in mind that when failover does happen you are going to experience a variety of hard to troubleshoot performance issues.

GlusterFS should eliminate network link
GlusterFS needs network to communicate between the nodes and keep data in sync.
Correct me if I am wrong.
Best way to learn is to try, but if you are looking for performance, high availability and reliability - you need to invest both time and financial resources.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Your setup will depend on you needs in IOPs, latency and capacity.

Please do not try to mix 10G and 1G. Even with multipath - i don't think you know how much troubles and hours you will get with such a setup.
Invest some money in a 2nd 10G nic.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!