extra VLAN bridges on top on bonded network in a PVE cluster

Any comments on this Network question extra VLAN bridges on top on bonded network in a PVE cluster
(sorry for reposting here but it seems more are trawling on this forum :)

All I can say about that post is no, don't do that. Use open vswitch instead, don't use classic linux bridges and bonds, they simply don't have the featureset you'd want for a virtualized environment so causes more management overhead. See the wiki I wrote on using Open vSwitch here: http://pve.proxmox.com/wiki/Open_vSwitch The second example is the one I use in production and uses LACP bonding as well.

I too use Juniper EX switches (EX4300, EX3300, EX2200 depending on environment ... Juniper SRX routers too) in chassis cluster/virtual chassis mode. You don't need a bridge per vlan when you use openvswitch. You instead have one bridge/switch configured in openvswitch that is tagged, and you instead just assign the vNIC for the VM to the vlan you want it on.

Pay attention to the multicast wiki too, I've updated it with some juniper-specific info.

-Brad
 
  • Like
Reactions: ghokun
Thanks!

Only seems I still can't get traffic between PVE cluster nodes, should the IFs all list as internal type, ie. as in hypervisor internal only network, which not what I want.
I want multiple vlan traffic across my physical bonded closed network over the two ex2200 switches.

Do I need to configure the ex2200 different to allow plan tagged packets to parse?

I would think that default ex2200 port access mode would parse all packets tagged and untagged, ie. just work as a media transfer/inter-connect between my PVE cluster nodes/hypervisor nodes.

Basically the PVE hypervisor nodes only needs to be able to access vlan3 ment for Ceph traffic, all other vlans (20,30,40) are ment for VM inter-traffic only.

So what is wrong with following config, please:

# ovs-vsctl show
8f44dec6-3b92-40c5-abd3-cb901537b9b9
Bridge "vmbr1"
Port "vlan3"
tag: 3
Interface "vlan3"
type: internal
Port "vlan20"
tag: 20
Interface "vlan20"
type: internal
Port "vlan40"
tag: 40
Interface "vlan40"
type: internal
Port "vlan30"
tag: 30
Interface "vlan30"
type: internal
Port "vmbr1"
Interface "vmbr1"
type: internal
Port "bond1"
Interface "eth2"
Interface "eth1"
ovs_version: "2.3.1"



# ifconfig
bond1 Link encap:Ethernet HWaddr f2:d0:71:3f:56:5d
inet6 addr: fe80::f0d0:71ff:fe3f:565d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:1379 errors:0 dropped:0 overruns:0 frame:0
TX packets:105 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:102936 (100.5 KiB) TX bytes:12570 (12.2 KiB)


eth1 Link encap:Ethernet HWaddr 00:1c:c4:dd:79:70
inet6 addr: fe80::21c:c4ff:fedd:7970/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:690 errors:0 dropped:0 overruns:0 frame:0
TX packets:49 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:51500 (50.2 KiB) TX bytes:5996 (5.8 KiB)


eth2 Link encap:Ethernet HWaddr 00:1c:c4:dd:79:6e
inet6 addr: fe80::21c:c4ff:fedd:796e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:689 errors:0 dropped:0 overruns:0 frame:0
TX packets:49 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:51436 (50.2 KiB) TX bytes:5996 (5.8 KiB)



vlan3 Link encap:Ethernet HWaddr d2:94:6e:67:21:a9
inet addr:10.0.3.6 Bcast:10.0.3.255 Mask:255.255.255.0
inet6 addr: fe80::d094:6eff:fe67:21a9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:1524 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:64292 (62.7 KiB)


vlan20 Link encap:Ethernet HWaddr 3e:cd:0d:5c:18:b1
inet addr:10.20.0.6 Bcast:10.20.255.255 Mask:255.255.0.0
inet6 addr: fe80::3ccd:dff:fe5c:18b1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:578 (578.0 B)


vlan30 Link encap:Ethernet HWaddr 66:8a:66:08:32:ae
inet addr:10.30.0.6 Bcast:10.30.255.255 Mask:255.255.0.0
inet6 addr: fe80::648a:66ff:fe08:32ae/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:578 (578.0 B)


vlan40 Link encap:Ethernet HWaddr 52:00:42:a0:2f:cd
inet addr:10.40.0.6 Bcast:10.40.255.255 Mask:255.255.0.0
inet6 addr: fe80::5000:42ff:fea0:2fcd/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:578 (578.0 B)

built from these /etc/network/interfaces def.:

# vOpenSwitch configuration below:


# Bond eth1 and eth2 together
allow-vmbr1 bond1
iface bond1 inet manual
ovs_bridge vmbr1
ovs_type OVSBond
ovs_bonds eth1 eth2
# Force the MTU of the physical interfaces to be jumbo-frame capable.
pre-up ( ifconfig eth1 mtu 9000 && ifconfig eth2 mtu 9000 )
ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
mtu 9000


# Bridge for our bond and vlan virtual interfaces (our VMs will
# also attach to this bridge)
auto vmbr1
allow-ovs vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
ovs_ports bond1 vlan1 vlan3 vlan20 vlan30 vlan40
mtu 9000



# Ceph cluster communication vlan (jumbo frames)
allow-vmbr1 vlan3
iface vlan3 inet static
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=3
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.0.3.6
netmask 255.255.255.0
mtu 9000


# Application Internal vlans
allow-vmbr1 vlan20
iface vlan20 inet static
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=20
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.20.0.6
netmask 255.255.0.0
mtu 1500


allow-vmbr1 vlan30
iface vlan30 inet static
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=30
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.30.0.6
netmask 255.255.0.0
mtu 1500


allow-vmbr1 vlan40
iface vlan40 inet static
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=40
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.40.0.6
netmask 255.255.0.0
mtu 1500
 
"I would think that default ex2200 port access mode would parse all packets tagged and untagged, ie. just work as a media transfer/inter-connect between my PVE cluster nodes/hypervisor nodes."
I should think an access port should be aware of tagged packages which leaves you with two options:
1) Keep switch ports as access port but assign pvid to vlan
2) Make switch ports as trunk ports so that they a vlan aware
 
Thanks!
I would think that default ex2200 port access mode would parse all packets tagged and untagged, ie. just work as a media transfer/inter-connect between my PVE cluster nodes/hypervisor nodes.

That is not my understanding of how access ports work. Access ports are for strictly _untagged_ traffic. Trunk ports can still have a 'native' vlan that allows untagged traffic while other traffic is tagged, but access explicitly means there is no tagged traffic.

Here's the relevant sections of my Juniper EX4300 config (yes, using 10Gbit on mine):
Code:
interfaces {
   xe-0/2/0 {
        description "trunk port to ProxMox1 on ae4";
        ether-options {
            802.3ad ae4;
        }
    }
    xe-1/2/0 {
        description "trunk port to ProxMox1 on ae4";
        ether-options {
            802.3ad ae4;
        }
    }
    ae4 {
        description "lacp group5 to ProxMox1";
        mtu 9216;
        aggregated-ether-options {
            lacp {
                active;
            }
        }
        unit 0 {
            family ethernet-switching {
                interface-mode trunk;
                vlan {
                    members [ 1 10 20 30 40 50 55 ];
                }
            }
        }
    }
}
vlans {
    default {
        vlan-id 1;
    }
    vlan.10 {
        vlan-id 10;
    }
    vlan.20 {
        vlan-id 20;
    }
    vlan.30 {
        vlan-id 30;
    }
    vlan.40 {
        vlan-id 40;
    }
    vlan.50 {
        vlan-id 50;
    }
    vlan.55 {
        vlan-id 55;
    }
}
 
Right so you think that it is my ex2200 in default access mode that does not allow any tagged packets through?

Would I also need to set ex2200 inter-connected ports to trunk mode allowing same vlan IDs for my bonding to allow failover+balancing?

Currently port 0,23,24 are cross-cabled between the two switches, while port 1-7 are on with er switch are connected to my 7x PVE nodes eth1 & eth2
 
So I properly dont, how do I assign pvids to my various vlans?

ovs-vsctl set port ??? trunks="[3 20 30 40]"
ovs-vsctl set port ??? pvid=x
ovs-vsctl set port ??? untag_pvid=true

can OVS untag more than one vlan ID before sending out packets on a physical port (my bond1) so I needn't change default access port modes in ex2200 switch?
but I assume it is hard to re-tag inbound untagged packets to more than one vlan-ID/pvid... so will this work at all or do I need to config ex2200 in trunk mode after all...

So in general maybe it does not make sense at all to try and seperat packets into different vlans as they are to be transferred between PVE nodes on the same wire anyway.

I just want to be able to use multiple LANs assigned to different NICs on various VMs and use one network for PVE node Ceph traffic.
All VMs are for same tenant so I am not so concerned about separation for security reasons.
All the sharing are just to save physical networks and share one bonded for all LANs :)
 
You're properly right, an access port only allow untagged packets through, thus I need to change ex2200 ports to trunk mode and set vlan ids to allow through.
presumable also my inter-switch connection ports (0,23,24) with crossover cables.

Means I need to figure out how to access my default fabric configured switches ... (might be initially hard for a server-admin guy :)
 
Or as I wrote before keep the access port but make the port member of the desired vlan. An access port work by stripping off all vlan tags an send the package as an untagged package on the configured vlan.
 
Okay, could indicate a little more precise how to do this and for which ports, related CLI examples would be nice and/or URLs to examples, TIA
 
Oh I thought it was a question of leave the junipers with default access mode ports and then fiddling with the open vswitch ports on my PVE hosts.

Assume this could be a starting point, but thanks anyway @mir
 
I think you will need to fiddle with juniper's port config since all access port will default to be member of vlan 1. A managed switch in default config behaves as an unmanaged switch.
 
So I properly dont, how do I assign pvids to my various vlans?
...
can OVS untag more than one vlan ID before sending out packets on a physical port (my bond1) so I needn't change default access port modes in ex2200 switch?

Question doesn't make sense at all. You won't ever untag multiple vlans as the other end wouldn't be able to put the vlans back together. If you don't need the proxmox boxes to be able to share vlans, you could just tag the access port as it comes in by adding:
Code:
ovs_options vlan_mode=access
ovs_options tag=1
To your bond configuration and at least traffic from your access-mode bond will make it into the bridge with a tag of 1. But your other VLANs simply won't work cross-host.

But the right solution is to modify your Juniper switch configuration to support tagging.

Regarding the ports you are using for the inter-switch connection though, the way juniper works when you are in chassis cluster/virtual chassis mode is those ports are dedicated ports for stacking so you don't have explicit vlan assignments at all. It causes the 2 switches to act as one big switch. You can't do cross-switch lacp bonds without them being in virtual chassis mode.

Now if for some reason you have no ability to change your switch config and you really need to share vlans across your proxmox hosts in your cluster, you can do GRE tunnels with openvswitch which essentially allows you to create your own overlay network. Neat stuff, but really adds a bit of complexity, and is not anything I've ever messed with. Should really only be used if you have no control over the network layer.
 
Right, so I need juniper sw ports 1-7 of my pve hosts in trunk mode with all vlan members of the wanted vlans
similar on both Switches.

How do I ensure cross switch ports are in virtual chassis mode.. like described here
 
Right.

As for the virtual chassis stuff, it sounds like you didn't set up the switch in the first place, and you stated in your original request that you were "bonded over two NICs eth1+eth2 connected to two separate interconnected switches" and your config example showed you were using 802.3ad. The only way that would be possible would be if your switches were already set up properly in Virtual Chassis mode.

The docs on configuring Virtual Chassis from Juniper are pretty good, there's not much assistance I could provide over what is provided there.
 
Okay found that my old Juniper SW v11.4R5.5 dont support virtual chassis :(, anyway it just for a test lab so for now I have done a config for a single switch with 7x LAG of dual ports like this:

Code:
chassis {
    aggregated-devices {
        ethernet {
            device-count 7;
        }
    }
}
interfaces {
    ge-0/0/0 {
        description "trunk port to ProxMox on ae0";
        ether-options {
            802.3ad ae0;
        }
    }
    ge-0/0/1 {
        description "trunk port to ProxMox on ae0";
        ether-options {
            802.3ad ae0;
        }
    }
    ge-0/0/2 {
        description "trunk port to ProxMox on ae1";
        ether-options {
            802.3ad ae1;
        }
    }
    ge-0/0/3 {
        description "trunk port to ProxMox on ae1";
        ether-options {
            802.3ad ae1;
        }
    }
    ge-0/0/4 {
        description "trunk port to ProxMox on ae2";
        ether-options {
            802.3ad ae2;
        }
    }
    ge-0/0/5 {
        description "trunk port to ProxMox on ae2";
        ether-options {
            802.3ad ae2;
        }
    }
    ge-0/0/6 {
        description "trunk port to ProxMox on ae3";
        ether-options {
            802.3ad ae3;
        }
    }
    ge-0/0/7 {
        description "trunk port to ProxMox on ae3";
        ether-options {
            802.3ad ae3;
        }
    }
    ge-0/0/8 {
        description "trunk port to ProxMox on ae4";
        ether-options {
            802.3ad ae4;
        }
    }
    ge-0/0/9 {
        description "trunk port to ProxMox on ae4";
        ether-options {
            802.3ad ae4;
        }
    }
    ge-0/0/10 {
        description "trunk port to ProxMox on ae5";
        ether-options {
            802.3ad ae5;
        }
    }
    ge-0/0/11 {
        description "trunk port to ProxMox on ae5";
        ether-options {
            802.3ad ae5;
        }
    }
    ge-0/0/12 {
        description "trunk port to ProxMox on ae6";
        ether-options {
            802.3ad ae6;
        }
    }
    ge-0/0/13 {
        description "trunk port to ProxMox on ae6";
        ether-options {
            802.3ad ae6;
        }
    }
    ae0 {
        description "Proxmox LACP Grp 0";
        mtu 9216;
        aggregated-ether-options {
            lacp {
                active;
            }
        }
        unit 0 {
            family ethernet-switching {
                port-mode trunk;
                vlan {
                    members 2-100;
                }
            }
        }
    }
    ae1 {
        description "Proxmox LACP Grp 1";
        mtu 9216;
        aggregated-ether-options {
            lacp {
                active;
            }
        }
        unit 0 {
            family ethernet-switching {
                port-mode trunk;
                vlan {
                    members 2-100;
                }
            }
        }
    }
    ae2 {
        description "Proxmox LACP Grp 2";
        mtu 9216;
        aggregated-ether-options {
            lacp {
                active;
            }
        }
        unit 0 {
            family ethernet-switching {
                port-mode trunk;
                vlan {
                    members 2-100;
                }
            }
        }
    }
    ae3 {
        description "Proxmox LACP Grp 3";
        mtu 9216;
        aggregated-ether-options {
            lacp {
                active;
            }
        }
        unit 0 {
            family ethernet-switching {
                port-mode trunk;
                vlan {
                    members 2-100;
                }
            }
        }
    }
    ae4 {
        description "Proxmox LACP Grp 4";
        mtu 9216;
        aggregated-ether-options {
            lacp {
                active;
            }
        }
        unit 0 {
            family ethernet-switching {
                port-mode trunk;
                vlan {
                    members 2-100;
                }
            }
        }
    }
    ae5 {
        description "Proxmox LACP Grp 5";
        mtu 9216;
        aggregated-ether-options {
            lacp {
                active;
            }
        }
        unit 0 {
            family ethernet-switching {
                port-mode trunk;
                vlan {
                    members 2-100;
                }
            }
        }
    }
    ae6 {
        description "Proxmox LACP Grp 6";
        mtu 9216;
        aggregated-ether-options {
            lacp {
                active;
            }
        }
        unit 0 {
            family ethernet-switching {
                port-mode trunk;
                vlan {
                    members 2-100;
                }
            }
        }
    }
}
vlans {
    proxmox.vlans {
        description "Vlans for Proxmox 2-100";
        vlan-range 2-100;
    }
}

will test with this tomorrow...

I would like to aggregate the bandwidth by load balancing rather than just failover as I understand LaCP mode just does, is this possible by any chance? No it also provides bandwidth aggregation ... good!
 
Last edited:
LACP does load balancing based on the source destination hash. So while you won't get double the bandwidth to a single destination, it does utilize both nics in an even fashion so it does double your overall bandwidth.
 
Yes I found out about LaCP & load balancing too.

BTW managed to upgrade my ex2200 today to JunOS 12.3R8.7 which now supports virtual chassis so I also managed to make a two sw virtual chassis and expanded my 7x LAG group over both SWs successfully. Had no fiber so I created 4x uplink qe as VCP ports, works fine.

Thanks to @brad_mss for sharing!

PS! Still haven't figured out hoot mark a thread as solved...