Open vSwitch

Feb 29, 2016
15
0
1
50
Hello


I am becoming increasingly confused with Proxmox and am hoping that someone here will be able to help me. I have emailed Proxmox but they don't seem to answer, or if they do then they don't understand what I am asking and just refer me here.

I have 3 servers with 14 x 1Gbe NICs in them. I have to make these into a cluster that also uses Ceph for replicated storage. I am trying to use Proxmox 4.2

I have VLANs set up on my Cisco SG300-52 switch as follows:

VLAN2 - Proxmox Management
VLAN3 - Proxmox Cluster Communication
VLAN4 - Ceph

Rather than use Linux Bonds and Bridges I want to use Open vSwitch because this is supposed to be easier for VLANs.

I have added my subscription key to the server and went to install open vSwitch using the details on this link

https://pve.proxmox.com/wiki/Open_vSwitch#Example_2:_Bond_.2B_Bridge_.2B_Internal_Ports

I note that other people here in the forums say "You must install Open vSwitch from the Proxmox repository" when I ran the commands it looked like it installed from the Debian repositories. How can I tell if I have the correct version please?

I have configured 2 x LAGs on my Cisco SG300-52 using LACP as well.

In Proxmox I have started to configured the first OVS-Bridge1, this uses and OVS-Bond1 that I have containing all the physical NICS from one of the LAGs

For example:

OVS-Bond uses eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7

The first LAG, LAG1, is a trunk has VLAN2 tagged to it, and will also have any extra VLANs needed for my VMs in the future. VLAN2 will be used for the web management of the Proxmox nodes

The second LAG, LAG2, is also a trunk and has VLAN3 and VLAN4 tagged to it. This will be used for the separate cluster network for Proxmox, and for the Ceph network.

I will plug the first set of ports from LAG1 into the the ports for OVS-Bond1

But that is where I get completely stuck as to how to progress from there. The Open vSwitch link above doesn't show anything to do with the GUI at all and only the config files. But other people on the forum say you shouldn't use the config files or Open vSwitch terminal commands and you must use the GUI or it won't work. Can anyone confirm which is the correct way please?

From there I want the Cluster network to be able to use 25% of OVS-Bond1 and the Ceph Network to use the other 75% of the network.

Do I just create one connection to OVS-Bridge1 for this for each network, or is each one similar to a 1Gbe port? In which case do I have to create multiples of them and then bond them?

How do I specify the percentage of the OVS Bridge that I want each VLAN to be able to use? So in theory at 25% the Cluster network will have 2 x 1Gbe links worth of network bandwidth available using the 8 x physical NICs. But if 4 x physical NICs failed and I only had 4 left working then at 25% the Cluster network will have 1 x 1Gbe links worth of network bandwidth available.

I hope all of this makes sense. I have set this sort of thing up many times before on VMware vSphere and on Microsoft Hyper-V and it is very straightforward. I am also very competent with Cisco Switches, it is just the Proxmox side of things that is confusing me.

Any help would be greatly appreciated
 
I have never known of an option to specify the percentages in regards to network bandwidth.... In regards to using the GUI vs the terminal config file for OVS, I would recommend using the GUI. Just play around with it assuming this is not production :D Also, if you want one vlan to have 25% of the bandwidth offered via 4 links, why not give the single vlan a single Gb ethernet link instead?
 
Hello

Thanks for the reply.

I have been playing around with the OVS GUI to try and make it work and am just running into brick walls all of the time. Trouble is the complete lack of any documentation around configuring this via the GUI. So to be brutally honest I haven't got a clue where to start with creating the ports for the VMs to connect to and how they work.

1) If I create a port then is that the equivalent to 1Gbe ethernet port or can I specify the speed?
2) If I can't do bandwidth management then what happens if I lose physical NICs from the incoming bond? Will things just stop working properly? For example if Ceph has 6 of the 8 NICs and the Cluster management has 2 of the 8 NICs. Lets say I lose 4 NICs because a switch in the logical stack goes down. Will Ceph then swamp the 4 remain NICs and stop the Proxmox Cluster traffic getting through? If so then the cluster will fail as well.

These servers are supposed to be going in to a Production environment when configured to provide us with HA and failover. It all reads beautifully in the Proxmox documents, but actually making it work is proving near impossible. And the complete lack of any help at all so far from Proxmox themselves leaves me very worried as to what would happen if I did put it in Production and had a real problem.

Right now if I can't make any serious headway then sadly my suggestion to my company is that we need to throw Proxmox in the bin and get something else.
 
LACP bonding probably doesn't work the way you think it does anyhow. Typically switches will use a destination hash to determine what physical port to send traffic out on. So if you have 8 physical links, and 3 servers, you'll use at most 2 of those physical links for communicating between those 3 servers, but if you're unlucky (that's the problem with hashing!), you'll only use 1 ... all the rest will just act as fail-over links. This mostly applies to your host<->host communication so ceph and proxmox cluster communication. For VMs, since you'd have a bunch of different MAC addresses, they'd more properly balance across the links (most likely anyhow).

I'd really not recommend adding 8 channels to a single bond, its highly unlikely it will do what you expect it to do. Typically there is no benefit after 4 channels. If you really need the extra bandwidth, you really need to be considering other technologies such as 10GbE or Infiniband. I've never heard of someone trying to do bandwidth shaping per-vlan using one link from the proxmox side. If you really want to do that, I'd recommend trying to do it in your networking gear ... otherwise you could look at using something like 'tc' in proxmox to do it, but I'm not sure if it has that kind of capability you seek.

FWIW, I always configure my host's openvswitch config via /etc/networks/interfaces. The only place I use the GUI is for setting up the VM guest's vlan tag.
 
  • Like
Reactions: mdream
Hello brad_mssw

Thank you for your reply.

You are correct about the bonding. There are various different ways of doing it like active/passive etc. I am aware that even at 8 bonds I will never actually get 1 connection that runs faster than the 1Gbe speeds, but hopefully if I configure it right I can get multiple connections running nicely at 1Gbe speeds.

Again you are so right about using either 10Gbe or Infiniband but I am stuck with what I have sadly, and the costs of the switches for those is still very high. I will be wanting some decent switches that can run stacked as one logical switch so that I can have resilience in case of a switch failure. At present I am hoping for 2 x Cisco 3850s. This at least gives the option for adding the module with 4 x 10Gbe at a later date to each of them.

The real issue here as well is I am trying to do a proper proof of concept on the Proxmox/Cluster/Ceph scenario. This requires the separate networks again for the bits I have mentioned Management/Cluster/Ceph/VM VLANs. So I'd need 4 LACP/LAG/Etherchannel s per server, 3 x 4 = 12 total. But the Cisco SG300-52 is only limited to 8 Max as it is a small business switch "Sigh!". I knew about the max 8 active members per LAG/LACP when setting them up, but didn't see anything about the SG300 only support max 8 actual LAG/LACP until it was too late. The 3850s support 128 of them! So has never been an issue in the past for me when using other, "proper", Cisco Switches. Hence why I am now trying to "Heath Robinson" something together using Open vSwitch and 8 ports.

However the servers I am using have 6 x 1TB SSDs each in them for use with Ceph so they will easily fill the 4 x 1Gbe bandwidth that you mentioned in this case. But spinners certainly wouldn't and what you mentioned would be very true for them.

Hyper-V does what I was alluding to with the internal vSwitch giving bandwidth percentages to certain networks. It is also amazingly easy to do as well. I think it came in on Hyper-V 2012 if I remember rightly. I'm finding most of the Proxmox stuff like wading through treacle as so many bits on the Wiki just don't show decent walkthroughs or steps, and some even completely contradict each other. Then of course Ceph is a completely different animal altogether. A quite spectacular animal nevertheless.
 
Even for multiple connections, between 2 separate hosts, will only use 1 physical link in most instances, so even if you have 100 connections, you'll be limited to 1Gbps. Linux can do balance-rr for outbound which would alleviate this concern, but the switch will do source or destination hashing, thus limiting incoming packets to the other server via 1 link, so it buys you nothing. I've never found a commercial switch vendor that supports something like linux's balance-rr.

We used to be a Cisco shop, but we moved everything to Juniper and found it to be much more cost effective, and in the end, provided much more robust management.

As a final note, on the Open vSwitch wiki, the last example shows direct node interconnecting using a ring topology, that way you don't need expensive switches, you'd just do direct-attach from node to node for the high-speed links necessary for Ceph when using SSDs. That didn't work out so well on Open vSwitch 2.3 (e.g. there were odd bugs when it decided what ports to put in forward mode), but 2.5 was released with proper RSTP support so I'm hopeful they're resolved there ... we were planning on testing that ourselves in the next couple of weeks.
 
Hello again brad_mssw

I have done the direct connect that you are talking about in my home lab. 3 x Dual port 10Gbe cards in a ring. It works great but then obviously you can't increase the cluster size with more nodes at a later date if you want to. Unless you know a way to do that you'd like to share with me? :D

I am constrained on the switches sadly as they are under a managed contract in a datacenter, even though they will be our property. No good me getting switches that they cannot support. I had originally liked the look of the Brocade ICX-6610 because it has 8 x 10Gbe connections per switch. Not sure what they run like though. I'd be happy to go with the Juniper but the support company aren't.

So I guess when I said wading through treacle earlier I suppose I should have said "wading through treacle, blindfolded, with my shoe laces tied together" ;)
 
Well, ring and star topology are the same thing when you have 3 nodes :)

At 4 nodes, the topologies diverge, whereas each node is always connected to 2 other nodes with ring topology, but with star topology it would be connected to every other node. Star topology really doesn't scale easily, and ring topology is usually ok from a risk standpoint. Ring topology can be bad if both the nodes to the left *and* right of you failed at the same time (unlikely) as it would break the loop, however, as long as each node is *also* connected to the core switches as the diagram shows with lower link speeds (ok, yeah, its not true ring topology, its just ring for the high-speed links), it wouldn't be a complete outage as RSTP would just send the traffic over those lower-speed links instead.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!