hardware suggestion

I realize, a footnote, for what it is worth. (?maybe?). My experience in terms of redundant network config requirements - ie - the strict need to have multi-path networking in your servers, switches, in order to be fault-tolerant - is that generally speaking, your risk of (server failure due to power supply faults or raid controller atypical death) is higher than (risk of a NIC death) or (risk of a switch death).

ie, generally in most common uses cases I've deployed, such as (for example) a 3-node proxmox cluster.

The need for having redundant NIC:Switch config is simply - ~zero - because the reliability of those parts of the stack, is so much higher than other components - that it is irrelevant. And the added build complexity is such that it isn't worth the effort. You are more likely to suffer an outage due to human error and configuration 'oops' during admin work (ie, maybe even worse risk because of added complexity and more moving parts in your stack).

I agree. But having a redundant network would double the performance and allow me to make planned work on a network segment without service interruption.

In other words: I have redundant networks (multiple switches and so on), but all of them are from a single NIC
 
Hi, so I guess I can't comment from experience at this point, because I have never compared copper vs SFP latency performance for impact on proxmox performance / around share-nothing migration; or for Shared storage performance.

My "educated guess" would be,
-- the benefit of added flexibility and bandwidth, for having 4 x 10gig copper (more bandwidth, more flexible config scenarios) vs 2 x SFP 10gig with lower latency
-- at least for most use case scenarios I can envision, is that I would never go for the SFP config. I would go for the 4 x copper port build and keep my platform options more versatile; and also maximize my bandwidth between nodes.

I guess it will depend in your case, how intensive your storage needs are / how sensitive your workload is to IO latency vs IO bandwidth.

Obviously if it was possible to benchmark your workload on the 2 different storage 10gig fabrics, that would be ideal, so you could make an informed decision. (ie, buy 2 x nodes which have both copper and SFP based 10gig; then do one build with gluster on copper and then a second build with gluster on fibre ; bench test consistently your workload and see what is the impact of the different storage fabric).

But I'm guessing that step is being not an option / and seeking input from the forum is the easier alternative. In which case it will be interesting to see, if anyone has actually done real-world bench tests such as this / to give any 'specific real world examples' of the benefits of lower-latency SFP 10gig in this deployment config (ie, gluster proxmox shared storage).


Tim
 
I agree with you.
The 4x10GB over copper are much more flexible, allow me to start with a 2x10GB cluster network (or even a 4x10GB cluster network until i'll use 2x10GB for the storage) allowing very fast live migration.

When I'll add a shared storage, i can simply remove two of these 4 bonded nic from the bond, use that two for the storage and stop using shared-nothing live migration (that would be useless with a shared storage)

The only advantage for the SPF+ configuration is the lower cost for used switches. Some used Quanta are sold at about 500 euros, with this price, I can but 3 or 4 switches for redundancy and spare parts paying less than a single 10GBaseT switch.
 
Sounds good! I will be interested to hear (if you don't mind) how your LACP trunking scales out performance once your config is stress-tested. I've found in some cases, it fails to give the desired (ie, linear) scale out. But it might work here, hence would be nice to see. Worst case, 10gig is pretty decent bandwidth even if not trunked for scale-out capacity / other than better multiple parallel task capacity.

Tim
 
LACP doesn't scale performance isn't supposed to do. LACP is used to aggregate ports allowing greater bandwidth and thus multiple "sessions" but still transfer at single port speed.

If you want to scale up performance, you have to use linux with bonding mode 5 or, better, 6. In this case, a single computer is able to transfer at 2gbit
 
Sorry, I was being sloppy (ie, brief). Assuming that there are multiple concurrent connections (ie, equal to or in excess of the number of trunk port members) between hosts generating traffic.. then trunking (mode 4,5,6 - I am interested to hear if you think 6 is better than 5 is better than 4? my general concise goto summary for the modes is as per, http://www.linuxhorizon.ro/bonding.html ) - will permit better aggregate throughput when considering total traffic passed for all sessions/interfaces -- is what I was sort-of trying to say - and this (more or less) depends on your workload / allocation of VMs / what VMs are IO intensive (or not?) etc.
 
bond mode 6 balance traffic in both direction, a single session is able to transfer at speed higher than a single port.
With LACP, in example with 4 gigabit ports bonded toghether via LACP, you won't be able to transfer at speed higher tha 1gbit.