Choosing a switch for my cluster

lince

Member
Apr 10, 2015
78
3
8
Hi there,

I'm building my first cluster and now I have to buy a switch. I want it to be 1Gbps and with multicast support for the cluster but what other things should I look for in a switch ?

- Link aggregation ?
- VLANs ?
- Jumbo frames 9K or 16K ?

I'm considering to buy the D-link DGS-1100-08.
 
Since you are looking at a 8 port switch i am assuming your cluster is extremely small with no plans of expansion in near future. I am fan of Netgear. My personal experience, i find Netgear interface much more polished and reliability rate much higher. For 8 port with features you are looking for i think this one from Netgear would be just fine:
GS108T-200NAS

Not sure if your Jumbo frame of 16K is a must have requirement or not, but unless you are spending insane amount of money on a single switch i doubt you will find anything with more than 9K Jumbo frame.
 
  • Like
Reactions: lince
Thanks for your reply. Its true that im building a small environment. Im just building a home lab for myself. It will have three nodes with a nfs in one of the nodes and i would like to try ceph at some point. I dont think it will grow much more than that, maybe one or two more nodes.

Im running in a linux router with one node and local sotrage at the moment but the router has only 100Mbps so i need 1Gbps network to set up the nfs storage and get some decent performance.

Apart from choosing d-link or netgear, are there any specific features I should look for in the switch ?
 
You wrote "link aggregation" and trying out "ceph", some questions come to mind:

- How many Nics does each of your nodes have / do you plan to use ?
- do you plan to do QOS via separate nics/bonds or via switch ?
- even tho it is "homelab" do you plan to use it for "home production" services ?

side-note: get a switch that allows enough LACP-groups. And also enough backplane speed for your total Links. Then its just about what interface you find more pleasing.

Personally i do not bother with <24 Port switches (1G) , but that is just me.
personally >9k jumbo frames for 1G switches is not worth spending money on. if you'd be talking 10G/40G switches ... maybe
 
  • Like
Reactions: lince
Thanks for your reply. Its true that im building a small environment. Im just building a home lab for myself. It will have three nodes with a nfs in one of the nodes and i would like to try ceph at some point. I dont think it will grow much more than that, maybe one or two more nodes.

Im running in a linux router with one node and local storage at the moment but the router has only 100Mbps so i need 1Gbps network to set up the nfs storage and get some decent performance.

Apart from choosing d-link or netgear, are there any specific features I should look for in the switch ?
I understand for Home environment you dont necessarily want to invest great deal of money on. Thats probably the reason you are going with 8 port switch. But this wont give you proper learning edge as you will be limited to few things. Such as Ceph will need a 2nd switch for proper setup for Ceph cluster sync. You can mitigate that cost by using an unmanaged dumb switch because you wont be needing vlans/lacp/multicast for that network.
To answer your original question, no besides vLan, Multicast and may be QoS as Q-wulf mentioned earlier, there are really no other feature you need look for. I agree with Q-wulf, i also dont really use anything less than 24 ports smart switches just for the convenience of plugging more things or just create more LAGGs. Only place i use small 8 ports are for Internet entry point for firewalls. This is also for my home lab environment.
Even if you are going with just 1 8 port switch, i do recommend that you use multiple vLans to keep each network separate so that you can mimic the learning as close to the real thing as possible. For example a Proxmox cluster with Ceph may have 3 networks.
1. 192.168.1.0/24 = vLan 1 = Proxmox cluster network
2. 192.168.2.0/24 = vLan 2 = Ceph Public Network. This also could be a storage network such as ZFS, NFS etc.
3. 192.168.3.0/24 = vLan 3 = Ceph Cluster Sync Network
 
  • Like
Reactions: lince
Thanks for your replies :D

You wrote "link aggregation" and trying out "ceph", some questions come to mind:

- How many Nics does each of your nodes have / do you plan to use ?
- do you plan to do QOS via separate nics/bonds or via switch ?
- even tho it is "homelab" do you plan to use it for "home production" services ?

side-note: get a switch that allows enough LACP-groups. And also enough backplane speed for your total Links. Then its just about what interface you find more pleasing.

Personally i do not bother with <24 Port switches (1G) , but that is just me.
personally >9k jumbo frames for 1G switches is not worth spending money on. if you'd be talking 10G/40G switches ... maybe

I have one nic per node but the node that is going to have the storage has a couple of free slots so I can add a second network card if I need it. That's the plan, one per node and the storage node will have one or two. There other two nodes are mini computers (msi cubi) so I don't think I will add an extra network card unless I get a usb card which proxmox may not even recognise.

The QOS questions is interesting, what could I use QOS for ?

It will probably have some production services but It will be mainly for personal use so I don't mind if it fails every now and then.

Good to know that 9k jumbo frames is good enough. The router I'm considering supports 2 LAG groups of 2 to 4 ports each. What is the backplane speed ? both switches are meant to support 16Gbps.

I understand for Home environment you dont necessarily want to invest great deal of money on. Thats probably the reason you are going with 8 port switch. But this wont give you proper learning edge as you will be limited to few things. Such as Ceph will need a 2nd switch for proper setup for Ceph cluster sync. You can mitigate that cost by using an unmanaged dumb switch because you wont be needing vlans/lacp/multicast for that network.
To answer your original question, no besides vLan, Multicast and may be QoS as Q-wulf mentioned earlier, there are really no other feature you need look for. I agree with Q-wulf, i also dont really use anything less than 24 ports smart switches just for the convenience of plugging more things or just create more LAGGs. Only place i use small 8 ports are for Internet entry point for firewalls. This is also for my home lab environment.
Even if you are going with just 1 8 port switch, i do recommend that you use multiple vLans to keep each network separate so that you can mimic the learning as close to the real thing as possible. For example a Proxmox cluster with Ceph may have 3 networks.
1. 192.168.1.0/24 = vLan 1 = Proxmox cluster network
2. 192.168.2.0/24 = vLan 2 = Ceph Public Network. This also could be a storage network such as ZFS, NFS etc.
3. 192.168.3.0/24 = vLan 3 = Ceph Cluster Sync Network

Well, in amazon I just checked that the price difference for the 8 port switch and the 16 port switch (d-link dgs-1100) is insignificant, there is even a cheaper 16 port option lol. But I also want to keep it small (that also why I bought the msi cubi nodes).

I like your suggestion about the vLans, I will create them. For now I guess I can make the 1st for proxmox, 2nd for storage and 3th for the VMs. What home lab have you got ?
 
[...]
I have one nic per node but the node that is going to have the storage has a couple of free slots so I can add a second network card if I need it. That's the plan, one per node and the storage node will have one or two. There other two nodes are mini computers (msi cubi) so I don't think I will add an extra network card unless I get a usb card which proxmox may not even recognise.

The QOS questions is interesting, what could I use QOS for ?
[...]

Oversimplified you use QOS to make sure that each protocol gets the bandwith it needs, but also can use more if its available.
Easiest way is to do Vlan segregation as symmcon and about every wiki/guide suggests. The poor mans way is to do QOS it via Vlans on the NIC-level, by using different nic(s)/vmbrX(s) for different Vlans. That requirers multiple Nics (at leat 2, better 3 if doing Ceph). Or you can use use the switching side to take care of QOS, but that requires Switches to support such features.

[...]
It will probably have some production services but It will be mainly for personal use so I don't mind if it fails every now and then.
[...]
If that is the case and you can live with occasional saturation of your single 1G links, then you are fine with single 1G links. How many OSD's your planning on running with ceph (HDD's dedicated to ceph-storage) ??
Not sure if you were planning to install Ceph on all 3 nodes or just one the one which can be upgraded to multiple nics,but ceph has that pesky habbit of saturating 1G links during backfill , heavy usage or scrub operations. Why is that an issue, because Corosync seems to be susceptible to jitter, causing your cluster to have "red node" issues. which then causes manual input. There are also other issues, but they all stem from saturating the available bandwith.
compare https://forum.proxmox.com/threads/nodes-going-red.24671/ for reference.

Unless ofc you use QOS, or very restrictive ceph-settings / limits to smooth this over.


[...]
What is the backplane speed ? both switches are meant to support 16Gbps.[...]

Backplane speed basically means how much bandwith the Switch can handle concurrently (ingress+egress). If you use 1G + 1G + 2x 1G on your hosts and that is never going to be increased, you need a backplane speed of at least 8 Gbit, since Gigabit Ethernet is full duplex. If its not full duplex you wanna start screaming at the Nic/switch producer or cable patcher immediately :)



[...]
Well, in amazon I just checked that the price difference for the 8 port switch and the 16 port switch (d-link dgs-1100) is insignificant, there is even a cheaper 16 port option lol.
You probably wanna have a a look at a proper price search engine like skintflint.co.uk / geizhals.eu (Hardware > Wired Network > Switches) , if only to compare products and get a feel for what "is out there" and then buy those elsewhere :)



A generally great read is also this post: https://forum.proxmox.com/threads/what-10g-switches-to-use-how-to-do-qos-ovs-ovn-sdn.25125/ from a guy that runs a "home Lab" using proxmox + ceph and sub-par network equipment. Even better yet is his original post here: https://www.reddit.com/r/networking/comments/3w62gt/been_assigned_to_completely_redo_the_enterprise/
It is a great read, not just for network gear and configs and ceph usage, but also for all the software that gets floated (that one can run on ones homelab). Definitely a good place to educate yourself on "stuff to avoid".
 
Last edited:
  • Like
Reactions: lince
I know what QoS is but I just didn't know how I could use it. It's true that if I have a single network card with multiple vlans I can use it to limit the traffic between vlans so none of the vlans can saturate the whole link :)

For the ceph setup, I'm planning on using two msata disks, one in each msi node, if it's not possible I can use glusterFS or some other setup. What I want to do, which I don't know if it's possible, is that each node would use the local storage and this storage would be replicated between the two/three nodes. This way I would get ~5Gbps for storage access with msata. But I don't know if syncing the disks will cause me a problem with 1Gbps links.

I'm choosing between this two switches, both look very similar:

http://www.downloads.netgear.com/files/GDC/datasheet/en/GS108Tv2.pdf
http://www.dlink.com/-/media/Business_Products/DGS/DGS 1100/Datasheet/DGS_1100_Series_Datasheet_EN_EU.pdf

Will I be able to use QoS per vlan on a single port with these switches to avoid saturation ?
D-link supports static trunk and netgear supports IEEE 802.3ad manual and LACP. I guess netgear is better on this one right ?
D-link supports 128 vlans and netgear 64 but I guess 64 would be more than enough right ?
D-link states quite a few features about vlans and netgear doesn't, but maybe there are also included ?
D-link supports 8K MAC address table and netgear 4K. Again I guess 4k would be enough ?
D-link has 2Mbits packet buffer and netgear 512KB buffer memory. Not sure if it's the same and not sure how important this is ?
MTBF for d-link is 500.000 and for netgear is 275.000.

Any other thing on the specs that would make you choose one or the other ?

Thanks :)
 
uhmmmm

I'm considering buying the 24 port switch D-Link DGS-1100-24 and getting some usb network cards for the nodes. The switch specs are almost the same as for the one with 8 port:

http://www.dlink.com/-/media/Business_Products/DGS/DGS 1100/Datasheet/DGS_1100_Series_Datasheet_EN_EU.pdf


I'd stay away from the "P" variants (last page of pdf) - those specs look weird as hell.
The "non-p" models seem to have what you need:
IGMP-snooping (if you go openvswitch - which you should)
QOS features
SNMP access (so you can proper monitoring using observium/librenms or others and get better insights into your network)

All of those switches seem to have enough backplane speed to allow you to fully saturate your links. What i however do not see is how many LACP groups/devices the "-24" switch supports. You are probably okay there, since with 3 nodes you will most likely not go beyond 6 LACP groups. (2 bonds per proxmox node) and they seem to start at 8.
 
  • Like
Reactions: lince
Thanks Q-wulf, The ones with P at the end is because they come with PoE support.

The one with 24 ports supports 12 groups for link agregation and 8 ports per group so I guess it will be enough for me :)

• Link Aggregation
• 1 group, 2-4 ports per group (DGS-1100-05)
• 2 groups, 2-4 ports per group (DGS-1100-08/08P)
• 8 groups, 8 ports per group (DGS-1100-16)
• 12 groups, 8 ports per group (DGS-1100-24/24P)

I also checked that proxmox 4 accepts the usb network card, at least it does have the kernel driver ax88179_178a.ko so I guess I will work fine. I will buy one and try it out to check the performance.
 
12 LACP + 8 ports per LACP is plenty. this would basically enable if you use different Bonds for e.g. Ceph backend, Ceph-Frontend, Corosync and Vm-clients. Total overkill if you ask me :P


I do not have any experience with USB-Nics and proxmox. personally I would never touch em with a 10-foot pole, but i have enough expansion-slots on lab, homelab and work Nodes so i can just use 4x1G pcie nics, so the need never arose.
 
There wasn't a big difference between the one with 16 ports and the one with 24 so that way I won't have to worry about the switch for a few years :)

I already ordered one of those usb gigabit cards (usb 3.0) so I'll test it with iperf and see how it goes. There is no other option for the msi cubi nodes so it's worth trying. They have a chip that takes care of the networking and usb 3 has a bandwidth of 5Gbps so it should be enough for a 1Gbps network adapter.

What nodes do you use in your home lab ?
 
Switches:
  • 1x ZyXEL XS1920-12
  • 1x ZyXEL XGS3700-24
  • assorted old-switches that i picked up at work for cheap


I have a single-Node home lab running ceph.
  • Supermicro X10DRi-T4+
    • 2x Intel Xeon E5-2620 v3 2x6x2.6Ghz + HT
    • 24x 8Gb DDR-4 2133
    • 4x10Gbase-T onboard in a single Bond using openvswitch
  • OSDs split by HDD/SSD/NVME - no journals - Cache tier + EC pools all the way
    • 2x M.2 NVMe using PCIe x4 adapter
    • 10x 32-256 GB SSD Sata 3 (what ever i have/had laying around - onboard)
      • 1 for OS (as of yesterday) previously 2 for OS using ZFS-Raid-1
    • 24x 0.5-8 TB Spinners (what ever i have/had laying around - 2x HBA + Backplane(s))
  • Frankestein'd-Case
I also have 3-Node Proxmox-cluster (no ceph)
  • configured to be able to run with 1, 2 or 3 nodes online
  • dual 1G nics
A Gaming Desktop curenly trying to run Proxmox and ceph on SSD's
  • 1x 64 GB SSD for OS
    • Used to be 2x in ZFS-Raid-1
  • 7 x SSD OSD
    • (used to be 6)
  • ATI HD 290X 8Gb passthrough to windows 10 VM (working on it)
  • Nvidia GTX 750 passthrough to OSX VM (planned)
    • testbed for a frankstein'd MacbookPro powered by Proxmox running OSX VM passing everything essential through)
 
  • Like
Reactions: SwampRabbit
Wow!! that's an impressive lab you got there. You telling me that my switch is overkill and you've got a single node with 192GB of ram :p tell me that you are not using all that hardware just to have a few vms for yourself.

The supermicro board looks incredible, where did you mount this board, did you use a home made box ? I also considered buying some boards on its own before I got the two msi cubi but couldn't find a box to put them in.

I guess your 3-node cluster might be more similar to my setup. What hardware are you using here ? and what storage ?
 
For home usage,
I use what ever hardware i can gobble up cheaply. It is Old gaming Machines / Big Towers i got my hands on. and plugged additional Disks / Nics into.

It does not have to be pretty.

The Single node Ceph "HomeLab" sits in one of the prototype Storagepods we initially build for work usage. With the exceptions that a lot of Holes have been cut into the case, a lot of Plexiglas airflow providers have since been glued in (JB-weld ftw) + a lot more fans internally and Disks mostly held together with Zip ties. Thats why it looks less like a prototype copy of a Backblaze Storagepod, but more of a Frankenstein Case fitting of someone still doing BTC mining on GPU's :p

Never said it had to look pretty :p
 
I was checking the backblade storage pods and they look pretty cool, they remind me to the ones I used to build for the ibm z-serie mainframes. They were like huge drawers full of disks and network cards :D

Going back to the switch for proxmox. One more question. I still have to design the network configuration and the vlans but I'm wondering, will I be able to connect different vlans between them with the switch or will I need an additional router for that ?
 
Already mentioned i do openvswitch and the zyxel-switches, so if the following is not answer your question i can only say in my defence that "I am not a network guy".

I segregate everything.

I do it this way (examples):
  • Every "service" gets his own vLan-Tag and subnet(/16 or /24)
    • Service - vlan-Tag X - 10.vLan.Server/client.NodeNumber
    • Corosync - Tag 2 - 10.2.Server/client.NodeNumber
    • Ceph public network - Tag 31 - 10.31.Server/client.NodeNumber
    • Ceph cluster network - Tag 32 - 10.32.Server/client.NodeNumber
    • NFS service - Tag 41 - 10.41.Server/client.NodeNumber
    • ...
Then to Proxmox i add a OVS_IntPort according to this scheme:

  • Proxmox-Node1
    • Management vLan=1 10.1.1.1/16
    • Corosync vLan=2 10.2.1.1/24
    • Ceph public Network vLan=31 10.31.0.1/16
    • Ceph Cluster Network vLan=32 10.32.0.1/16
    • NFS-Access vLan=41 10.41.1.1/16
    • Net Access vlan=200 10.200.1.1/16 Gateway 10.200.0.1
  • Proxmox-Node2
    • Management vLan=1 10.1.1.2/24
    • Corosync vLan=2 10.2.1.2/24
    • Ceph public Network vLan=31 10.31.1.2/16
    • Ceph Cluster Network vLan=32 10.32.1.2/16
    • NFS-Access vLan=41 10.41.1.2/16
    • Net Access vlan=200 10.200.1.2/16 Gateway 10.200.0.1
  • Proxmox-Node3
    • Management vLan=1 10.1.1.3/24
    • Corosync vLan=2 10.2.1.3/24
    • Ceph public Network vLan=31 10.31.1.3/16
    • Ceph Cluster Network vLan=32 10.32.1.3/16
    • NFS-Access vLan=41 10.41.1.3/16
    • Net Access vlan=200 10.200.1.3/16 Gateway 10.200.0.1
lets say I am addressing a NFS-Server e.g. openMediaVault or TrueNas running on top of Ceph on Proxmox-Node2
  • NAS1
    • Management vLan=1 10.1.41.1/16
    • NFS-Provider vLan=41 10.41.0.1/16
    • Net Access vlan=200 10.200.41.1/16 Gateway 10.200.0.1
  • NAS2
    • Management vLan=1 10.1.41.2/16
    • NFS-Provider vLan=41 10.41.0.2/16
    • Net Access vlan=200 10.200.41.2/16 Gateway 10.200.0.1

I use the Proxmox-Firewall to make sure that only SSH coming in on 10.1.0.X/16 is accepted, thats where i have my Admin-Laptop on (10.1.0.101/16).


Some notes:
- I do QOS via the switches or via a SDN-Controller (depending on what i am testing) handling openvswitch.
- I provide (Inter)Net-access via OpenVPN from Pfsense sitting in 2 VM's using Carp.
- Most likely overkill and related to OCD.
 
  • Like
Reactions: lince
Nice example q-wuld :) another thing to read about. Already checked a couple of webs about openvswitch today and it looks interesting.

Also, I already got the d-link switch so I finally have a cluster in a 1Gbps network :) Now I have to finish setting up everything, migrating the vms to the cluster, testing the usb network and thinking how I will configure the vlans/network. I will use your example as reference.
 
I already tested the usb network cards with iperf3. The result gave a maximun speed of 940Mbps. This is the same max. Speed I get with onboard network cards so its a very good result and I will be able to increase my cluater's bandwith with them :)

What I saw is that they are more intensive in cpu content switching and interrups, althought the load average didnt really increase and the cpu was mostly idle.

Also happy new year to everybody.