[SOLVED] Networking problem in 4.1-33 (solved in 4.2-10)

PigLover

Renowned Member
Apr 8, 2013
137
43
93
I just upgraded one node to 4.1-33 (no-subscription repo - not pvetest).

Love the new webui. Very nice.

Unfortunately, it appears that you can no longer set a VLAN tag for VLAN 1. My current network uses fully trunked VLANs (everything tagged, untagged traffic blocks). This has worked well in prior versions. The rest of the cluster is on 4.1-22 and you can set VLAN 1 on network devices.

With 4.1-33 there is a parameter check on the network device form that only allows you to enter VLAN values of 2-4094, If you migrate a running VM with network adapters tagging VLAN 1 from a host running 4.1-22 to host running 4.1-33 it will fail. If you attempt this same migration "offline" it simply deletes the network adaptor.

I know tagging VLAN 1 is unusual - but it not invalid.
 
VLAN 1 is used for untagged traffic on linux bridges, which is why we disabled setting it. Since fully trunked VLANs are (currently) the exception rather than the norm, this is erring on the side of caution. Is it not possible for you to migrate your VLAN 1 to another tag?
 
I think your statement should be revised to read "VLAN 1 is normally used for untagged traffic...".

I agree that in the default this is the case. But it is not universally true. Linux networking, including bridges and OVS, fully support the use of tagged traffic on VLAN 1 and setting the PVID (untagged) VLAN to a value other than 1. Most all network switches that support VLAN also allow this. It has valid (though somewhat rare) uses.

I am curious if you were trying to fix a specific problem or bug by "disabling" this use case. If not it seems odd to me that you would disable this just because it is not "normal".

With some significant work I could probably work around this change. But I think you've likely created havoc for many other users.
 
I think your statement should be revised to read "VLAN 1 is normally used for untagged traffic...".

I agree that in the default this is the case. But it is not universally true. Linux networking, including bridges and OVS, fully support the use of tagged traffic on VLAN 1 and setting the PVID (untagged) VLAN to a value other than 1. Most all network switches that support VLAN also allow this. It has valid (though somewhat rare) uses.

I am curious if you were trying to fix a specific problem or bug by "disabling" this use case. If not it seems odd to me that you would disable this just because it is not "normal".

With some significant work I could probably work around this change. But I think you've likely created havoc for many other users.

Fair enough.. We'll see whether it is possible to support this non-default setup in a sane way (i.e., without an explosion in complexity). Opened bug #952 for tracking (feel free to provide more detailled information there).
 
I appreciate opening the bug report, but the approach is quite soft ("if possible..."). Please note that selecting VLAN 1 as a tagged VLAN was fully supported in 4.1-22 and quietly went away with the UI changes and audits in 4.1-33. Getting this put back the way it was is actually quite urgent.

This is a pretty significant change in behavior quietly introduced in a minor point-release update. Simply doing an apt-get upgrade (not even a dist-upgrade) will break working configurations.
 
I agree, even more so because to return to the old behaviour only three lines of code need to be changed:

/usr/share/perl5/PVE/QemuServer.pm line 566 -> change minimum to 1
/usr/share/perl5/PVE/LXC/Config.pm line 486 -> change minimum to 1
/usr/share/pve-manager/ext6/pvemanagerlib.js line 4817 -> change minimum to 1

I just changed the above code, and can confirm that everything works as I expect.
 
I agree, even more so because to return to the old behaviour only three lines of code need to be changed:

/usr/share/perl5/PVE/QemuServer.pm line 566 -> change minimum to 1
/usr/share/perl5/PVE/LXC/Config.pm line 486 -> change minimum to 1
/usr/share/pve-manager/ext6/pvemanagerlib.js line 4817 -> change minimum to 1

I just changed the above code, and can confirm that everything works as I expect.
@fabian - can we get an update on the bug report with this info (I am not able to update it) and perhaps it expedited back into the next point release?
 
I implemented the three edits proposed by Lord_Gaav and can confirm that it completely restores the behavior from 4.1-22.

- The ability to set VLAN 1 as a tagged VLAN is restored
- The VM operates correctly with the VLAN 1 tagged.
- You can migrate a VM with VLAN 1 tagged to the fixed PVE host.
- You can migrate a VM with VLAN 1 tagged from the fixed PVE host.

All of this was the behavior from 4.1-22 that was modified (without notice) in 4.1-33.
 
I ran also in this problem and had to fix it quickly today and used the three edits, so all is running like before.
As others stated without notice this broke the network completely so this is a no-go, please don't do this again ;-)

My interfaces looks like this

allow-ovs vmbr2
allow-vmbr2 vlan1
iface vlan1 inet static
address 10.132.48.14
netmask 255.255.255.0
gateway 10.132.48.254
ovs_type OVSIntPort
ovs_bridge vmbr2
ovs_options tag=1
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
mtu 1500

iface eth0 inet manual

iface eth1 inet manual

allow-vmbr2 bond1
iface bond1 inet manual
ovs_bonds eth0 eth1
ovs_type OVSBond
ovs_bridge vmbr2
ovs_options tag=1 bond_mode=balance-tcp other_config:lacp time=fast vlan_mode=native-untagged lacp=active

auto vmbr2
iface vmbr2 inet manual
ovs_type OVSBridge
ovs_ports vlan1 bond1
mtu 1500
So what is the suggested config now for achieving this? I have VMs which have a networkinterface with VLANtag 1 which are connected to the bridge. So it isn't working when I leave the VLAN empty, or should it?

And the new design looks nice!
 
Yup - running trunks with VLAN 1 tagged is normal and supported by Linux, Server 20xx, almost all VLAN capable routers and switches, In many cases the major network vendors (Cisco, Juniper, etc) actually recommend it. At least two examples of configs using VLAN 1 tagged are included in Proxmox own example Wiki (as noted above).

Though many consider it "unusual", it is normal.

This is a bug. It is a serious bug. It was introduced intentionally by a well meaning but poorly informed decision on the part of the Proxmox staff. Worse - it was introduced on a point release, was not noted in any release notes or change summary.

The fix is trivial. I seriously don't understand why its not fixed yet (or why the Proxmox team decided to classify the bugzilla as an "enhancement request").
 
The fix is trivial. I seriously don't understand why its not fixed yet (or why the Proxmox team decided to classify the bugzilla as an "enhancement request").
sadly this is not so easy

i understand your frustration that your current setup does not work anymore, but there are some problems with allowing vlan tag 1

first: even wikipedia states (https://en.wikipedia.org/wiki/IEEE_802.1Q):
On bridges, VID 0x001 (the default VLAN ID) is often reserved for a management VLAN; this is vendor-specific.

so it might work on your setup but not on others.

second:

on vlan aware linux bridges, untagged traffic gets a tag of vlan 1 meaning:
if you have a bridge which gets untagged vlan traffic (eg. vlan 2) and tagged vlan 1 traffic and now
connect a vm with tag=1 there, it gets the traffic from vlan 1 as well the traffic from vlan 2 ( which clearly is not what you wanted )
so now it "works" but you have created a possible security hole
 
I'll resist the burning desire to mock using Wikipedia as a technical reference. One would have hoped that you'd quote the relevant IETF documents - but I know those docs intimately and I know you wouldn't find support for your approach in them.

In any case, assuming the statement from Wiki is both valid and relevant, your application of it is contorted. Substantially all switches on the market today - including from Cisco, Juniper, Brocade, ALU, Mikrotik, Netgear, D-Link and pretty much all others, both enterprise and consumer, do not suffer from this problem. Your argument summarizes to "because some switches that will have a problem might exist, we will block it for all use cases". By extension, because non-VLAN switches still exist in the marketplace, your argument logically requires that you should ban ALL use of VLAN tagging. Silly.

The mapping of tagged to non-tagged traffic that you note is well known and documented in the RFCs, and it is endemic to most switches - not just Linux bridges. It is an unavoidable side-effect of permitting backward-compatibility between 802.11 1Q and earlier standards. In a perfect world ALL traffic would ALWAYS be tagged - but IETF realized backward compatibility was required and there would be a few minor issues. Preventing people from using a valid config because they might make a mistake in its use is not really a reasonable approach - a bit nanny-ish and ultimately unhelpful.
 
Last edited:
Wouldn't it be better to implement a kind of warning when trying to use vlan 1 rather then not allowing it? It seems to me, that this setup is used definitely sometimes but normally by people who "know what they are doing".
 
Wouldn't it be better to implement a kind of warning when trying to use vlan 1 rather then not allowing it? It seems to me, that this setup is used definitely sometimes but normally by people who "know what they are doing".
If you follow the bug tracker this is exactly what they have done: https://bugzilla.proxmox.com/show_bug.cgi?id=952. Its checked into Git - just waiting for them to get it into the repos in an upcoming update. Hopefully that won't be long now.
 
Uhm, yes, I could have looked at the bugtracker, sorry. But hey, it seems my ideas wasn't the worst, as it is the same.
Thank you!
 
Uhm, yes, I could have looked at the bugtracker, sorry...
No worries here. I didn't reference the bug tracker to criticize you for not looking - I did it just to show you that they were already working on the same idea.

Still anxious for it to get integrated into the repos. I've tested it from Git but I don't want to push it out until its published.
 
Thanks, I was just thinking to myself "hey, you found the bug in the bugtracker before, so you could solve the issue with the information out of that thread. When you know about that entry you could have really looked before posting instead of making others extra work".
Because if possible I try to don't bother other people especially in opensource projects who give so much for free.

But beside that, yes, hopefully it will be integrated soon, so I can do my updates.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!