SDN: To be VLAN aware or not? Or another problem?

lifeboy · Jun 6, 2022

I'm experimenting with the SDN service and have not been able to figure this one out:

If I create a Vnet (in my case 11) and tick "VLAN aware", then I cannot define a subnet.
If, on the other hand, I untick "VLAN aware", I can define a subnet.

What is the point of this and how does one practically use this? Below is the problem I'm running into, and cannot figure out what is going wrong.

I'm battling to get a SDN with a VLAN and bridge to become visible from a KVM guest, so I'm wondering how one gets this to work? I add to the guest (a Win 10 machine) the ip 192.168.142.100 and my SDN is VLAN 11. I have pfSense listening on the VLAN11 bridge on the IP 192.168.142.254 (a carp address between 2 instances of pfSense. The two firewalls are able to ping each other (192.168.142.252 and 192.168.142.253), I can ping either and the CARP address from my Proxmox nodes, so that indicates to be the network is working as expected. However, the Win10 machine cannot "see" past it's own IP address.

I have attached the /etc/network/interfaces and /etc/network/interfaces./sdn files (had to add .txt, since files without extensions are not allowed)

lifeboy · Jun 6, 2022

lifeboy said:
I have pfSense listening on the VLAN11 bridge on the IP 192.168.142.254 (a carp address between 2 instances of pfSense. The two firewalls are able to ping each other (192.168.142.252 and 192.168.142.253), I can ping either and the CARP address from my Proxmox nodes, so that indicates to be the network is working as expected. However, the Win10 machine cannot "see" past it's own IP address.

CORRECTION: On checking this again, here is what I actually get:

I can ping the first pfSense VLAN 11 address from any node. (192.168.142.252)
I cannot ping the backup pfSense VLAN 11 address from any node, nor from the first pfSense (192.168.142.253)
I can ping the CARP virtual IP from any node.

So it seems that there is not issue with the backup pfSense config (although I'm have not been able to find this yet).

However, regardless of the above, the windows 10 KVM guest cannot ping anything but it's own ip address.

spirit · Jun 7, 2022

the vlan aware option on the vnet, is if you want to tag multiple vlans inside your guest. (some users had requested, like add vlan tag inside vxlan tag for example).
The nomal usage is without vlan aware. (The vlan tag is defined on the vnet)

The subnet is not yet too much used (only for routed setup and bgp-evpn, where the gateway of the subnet is deployed on the host).
In the future, the subnet will be use to auto deploy ips to vm/ct. (but it's not yet implement)

Now, if you want to define vlan tags inside your pfsense, you can create a vlanaware vnet without any tag, and define tags inside your pfsense.

lifeboy · Jun 7, 2022

Thanks for that clarification! So I should really just set up a non-VLAN aware Vnet and work with that for now.

So maybe I should rephrase my problem then: If I have set up a Vnet and Zone, it creates a bridge tagged with the VLAN. Then, when I add that bridge to a KVM guest machine, the networking must work for that VLAN. What is wrong with my configuration that causes this to not work as expected?

From the listing below, the VLAN exists and is "up".

Code:

# ip a
...
19: VLAN11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether de:73:5a:62:42:46 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::dc73:5aff:fe62:4246/64 scope link 
       valid_lft forever preferred_lft forever
...
49: ln_VLAN11@pr_VLAN11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master VLAN11 state UP group default qlen 1000
    link/ether de:73:5a:62:42:46 brd ff:ff:ff:ff:ff:ff
50: pr_VLAN11@ln_VLAN11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0v11 state UP group default qlen 1000
    link/ether 62:4b:0a:62:8a:a8 brd ff:ff:ff:ff:ff:ff
51: eth2.11@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0v11 state UP group default qlen 1000
    link/ether ac:1f:6b:ca:e3:e6 brd ff:ff:ff:ff:ff:ff
52: vmbr0v11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ac:1f:6b:ca:e3:e6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ae1f:6bff:feca:e3e6/64 scope link
       valid_lft forever preferred_lft forever

This is from node4 (but the same on all the other nodes):

Code:

root@FT1-NodeD:~# ping 192.168.142.252
PING 192.168.142.252 (192.168.142.252) 56(84) bytes of data.
64 bytes from 192.168.142.252: icmp_seq=1 ttl=64 time=0.165 ms
64 bytes from 192.168.142.252: icmp_seq=2 ttl=64 time=0.182 ms
^C
--- 192.168.142.252 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1029ms
rtt min/avg/max/mdev = 0.165/0.173/0.182/0.008 ms
root@FT1-NodeD:~# ping 192.168.142.254
PING 192.168.142.254 (192.168.142.254) 56(84) bytes of data.
64 bytes from 192.168.142.254: icmp_seq=1 ttl=64 time=0.181 ms
64 bytes from 192.168.142.254: icmp_seq=2 ttl=64 time=0.119 ms
64 bytes from 192.168.142.254: icmp_seq=3 ttl=64 time=0.157 ms
^C
--- 192.168.142.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2032ms
rtt min/avg/max/mdev = 0.119/0.152/0.181/0.025 ms

The above two addresses are the ports on my pfSense router. I simply added the bridge to the VM config and added the ip in pfSense. The second one (.254) is a virtual ip.

However, the windows VM guest, where I did the same (added the bridge of the VLAN11), does not work.

Code:

root@FT1-NodeD:~# ping 192.168.142.100
PING 192.168.142.100 (192.168.142.100) 56(84) bytes of data.
^C
--- 192.168.142.100 ping statistics ---
7 packets transmitted, 0 received, 100% packet loss, time 6129ms

spirit · Jun 8, 2022

config seem to be good.

ping 192.168.142.100

this is the ip of your windows vm ? it yes, do you have disable windows firewall ? (because it's bloc incoming ping by default)

lifeboy · Jun 8, 2022

spirit said:
config seem to be good.

this is the ip of your windows vm ? it yes, do you have disable windows firewall ? (because it's bloc incoming ping by default)

If that was the case, then I would be able to ping at least the host node's address from the windows guest, but I can't.
I have now configured an Ubuntu LXC, so test that.

The address is 192.168.142.91.

If I have the LXC running on the node where the pfSense guest is running, then I can ping the address on the VLAN (192.168.142.252) and the virtual CARP address (192.168.142,254). When I migrate the container to nodeD though, I can't ping anything from the container.

pfSense can be reached from nodeD:

Code:

FT1-NodeD:~# ping 192.168.142.252
PING 192.168.142.252 (192.168.142.252) 56(84) bytes of data.
64 bytes from 192.168.142.252: icmp_seq=1 ttl=64 time=0.150 ms
64 bytes from 192.168.142.252: icmp_seq=2 ttl=64 time=0.177 ms
^C
--- 192.168.142.252 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1029ms
rtt min/avg/max/mdev = 0.150/0.163/0.177/0.013 ms

The container cannot be reached.

Code:

FT1-NodeD:~# ping 192.168.142.91
PING 192.168.142.91 (192.168.142.91) 56(84) bytes of data.
^C
--- 192.168.142.91 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3051ms

From pfSense running on nodeA:

Cannot reach the container on nodeD:

Code:

--- 192.168.142.91 ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss

Can reach the Windows guest on nodeA:

Code:

[2.6.0-RELEASE][roland@fw-1A.fast.za.net]/home/roland: ping 192.168.142.100
PING 192.168.142.100 (192.168.142.100): 56 data bytes
64 bytes from 192.168.142.100: icmp_seq=0 ttl=128 time=0.713 ms
64 bytes from 192.168.142.100: icmp_seq=1 ttl=128 time=0.416 ms
64 bytes from 192.168.142.100: icmp_seq=2 ttl=128 time=0.422 ms
^C
--- 192.168.142.100 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.416/0.517/0.713/0.139 ms

From the container running on nodeD:

(I can only access the running container via the Proxmox console)

What is happening here? Some ideas I have:

Traffic between the guests and containers goes via pfSense. However, pfSense cannot reach any guest on nodes other that node it is running on itself. This seems to point at the network somehow.
I can ping pfSense from NodeD and also the address that is part of VLAN 11 on pfSense.
From pfSense I can ping the VLAN 11 guest on the same node, but not on another node.

This to me looks like VLAN 11 is not functioning like it should. Traffic flows across the network, but not if it's VLAN 11 tagged.

My Mellanox switch is set up with Hybrid ports and VLAN 11 is allowed.

It's clear something is still missing, but what?

spirit · Jun 8, 2022

"The two firewalls are able to ping each other (192.168.142.252 and 192.168.142.253)"

they are on 2 differents nodes ? if yes, that's mean that vlan are working fine.

lifeboy · Jun 13, 2022

spirit said:
"The two firewalls are able to ping each other (192.168.142.252 and 192.168.142.253)"

they are on 2 differents nodes ? if yes, that's mean that vlan are working fine.

They can only ping each other when they're on the same node. I corrected myself after posting that in the first message, so apologies if that was not clear.

I have since found that there are messages on the switch that say the VLAN is being filtered, so I've taken it up with their tech support.

lifeboy · Jun 14, 2022

lifeboy said:
I have since found that there are messages on the switch that say the VLAN is being filtered, so I've taken it up with their tech support.

After a debug session with NVidia on the Mellanox switch, we have found that the switch is not the culprit and it actually registering the proxmox nic mac addresses and passing the traffic.

So it's back to the config of proxmox is seems.

lifeboy · Jun 14, 2022

I have a few vlans set up manually as can be seen from my config here:

I have now established that VLAN 35 is actually working 100%, one I set up manually as being eth2.35 but not VLAN aware.

Now for the SDN config...

I'll compare the generated config with the manual one.

lifeboy · Jun 14, 2022

This is weird.

I removed the bridge VLAN11 from the SDN and recreate it, then assigned the bridge to the applicable devices and it works. 100% the same as what I had before.

This was driving me mad and now it has just numbed me...

At least it's working now.

Some more feedback: I think this is what happened. Initially I set up the SDN VLANs being VLAN aware. I added the bridge (called VLAN11 for example) to the firewalls and guest machines. It didn't work, ie the machine could not communicate over the network.

I then did a lot of playing around with the configuration of the SDN and at some point realised I should not have the SDN zones VLAN aware, so I turned it off. Apply the changes changed the network and bridges, but it did not change the understanding/settings that the guest OS'es has of the SDN, so it only started working with the new settings after I removed them from the guests and re-added them.

Does this make sense? Am I mistaken? I cannot think of any other substantial change that I made to switch from no comms via the VLAN to successful comms.

Proposal: Add a note the SDN documentation that if you change the network from being VLAN aware to not VLAN aware, you have to remove and re-assign the guests' bridges.

spirit · Jun 14, 2022

oh yes, the switch between aware - non aware don't works fine with vm currently running on it.

(Same with classic vmbrX).

I'll look to add some doc about this.

SDN: To be VLAN aware or not? Or another problem?

lifeboy

Renowned Member

Attachments

lifeboy

Renowned Member

spirit

Distinguished Member

lifeboy

Renowned Member

spirit

Distinguished Member

lifeboy

Renowned Member

spirit

Distinguished Member

lifeboy

Renowned Member

lifeboy

Renowned Member

lifeboy

Renowned Member

lifeboy

Renowned Member

spirit

Distinguished Member

We value your privacy