VLAN's Stopped Working in 6.0-11 update

Celestialdeath99 · Nov 6, 2019

I recently updated my 4-node Proxmox cluster to version 6.0-11 and after the update machines that are on a VLAN are no longer able to access the internet or connect to any other machine, or get an IP address. VM's and containers not on a VLAN continue to work just fine. There were no other changes to the environment made besides the update to proxmox. Each of my 4 nodes have 4 NIC's and all share an identical network configuration. They are all Dell PowerEdge R610 servers.

Does anyone have any ideas as to how I can resolve this? Currently about 700 vms/containers are unable to connect to the internet!!!

t.lamprecht · Nov 6, 2019

Celestialdeath99 said:
Does anyone have any ideas as to how I can resolve this?

Can you try to boot an older Kernel, I'd first suspect that component.

Celestialdeath99 · Nov 6, 2019

t.lamprecht said:
Can you try to boot an older Kernel, I'd first suspect that component.

That is the first thing I did try, sorry I forgot to mention that. The systems are currently running the kernel 5.0.21-3

t.lamprecht · Nov 7, 2019

Celestialdeath99 said:
That is the first thing I did try, sorry I forgot to mention that. The systems are currently running the kernel 5.0.21-3

Hmm, and it still is broken? If so I'd not rather suspect the environment (network switches or similar?)..

Else, I mean, you can check the less /var/log/apt/history.log to see what the specific package set included in the last upgrade was. Maybe we can pin it down to a specific one.

Celestialdeath99 · Nov 7, 2019

t.lamprecht said:
Hmm, and it still is broken? If so I'd not rather suspect the environmnet (network switches or similar?)..

I have double, and triple checked the environment already. The firewall has not been modified and I have disabled automatic updates there. The switches are also unchanged. Additionally there are additional devices on the same VLANs that some of the VM's are trying to use and those devices work without issue. Since the only thing that was changed was packages on proxmox and other devices still function on the same VLAN I can only assume that it is something with the proxmox machines.

t.lamprecht said:
Else, I mean, you can check the less /var/log/apt/history.log[/code] to see what the specific package set included in the last upgrade was. Maybe we can pin it down to a specific one.

Here is the output from the log you mentioned. I made it a little more human readable but these are all the packages that were updated.

Code:

Start-Date: 2019-11-05  09:09:49
Commandline: apt upgrade
Install: pve-kernel-5.0.21-4-pve:amd64 (5.0.21-8, automatic)
Upgrade:
  - pve-kernel-5.0:amd64 (6.0-9, 6.0-10),
  - libpve-access-control:amd64 (6.0-2, 6.0-3),
  - pve-firmware:amd64 (3.0-2, 3.0-4),
  - pve-qemu-kvm:amd64 (4.0.1-3, 4.0.1-4),
  - pve-docs:amd64 (6.0-7, 6.0-8),
  - zfs-initramfs:amd64 (0.8.2-pve1, 0.8.2-pve2),
  - pve-container:amd64 (3.0-7, 3.0-10),
  - libspice-server1:amd64 (0.14.2-4~pve6, 0.14.2-4~pve6+1),
  - zfsutils-linux:amd64 (0.8.2-pve1, 0.8.2-pve2),
  - pve-manager:amd64 (6.0-9, 6.0-11),
  - spl:amd64 (0.8.2-pve1, 0.8.2-pve2),
  - libzfs2linux:amd64 (0.8.2-pve1, 0.8.2-pve2),
  - libpve-guest-common-perl:amd64 (3.0-1, 3.0-2),
  - libpve-common-perl:amd64 (6.0-5, 6.0-6),
  - lxc-pve:amd64 (3.1.0-65, 3.2.1-1),
  - qemu-server:amd64 (6.0-9, 6.0-13),
  - pve-kernel-helper:amd64 (6.0-9, 6.0-11),
  - libzpool2linux:amd64 (0.8.2-pve1, 0.8.2-pve2),
  - libnvpair1linux:amd64 (0.8.2-pve1, 0.8.2-pve2),
  - libuutil1linux:amd64 (0.8.2-pve1, 0.8.2-pve2)
End-Date: 2019-11-05  09:10:40

Celestialdeath99 · Nov 7, 2019

Okay, so in an attempt to get some of my services back online I moved some VM's and Containers off of the VLANs and they still were not able to connect. With this discovery I did some playing around. The guests are not able to connect if they are going over a bonded network. So in this case I have bonded eno3 and eno4 together into bond0 and then created a bridge using bond0 called Guest Network.

Any guests using the Guest Network are unable to connect to the internet meanwhile if I connect them through either the vmbr0 or vmbr1 interface it works. Those two interfaces are not bonded and are directly linked to the network.

Screenshot_2019-11-07 pve2 - Proxmox Virtual Environment.png

So in summary it would appear the issue is not with VLANs but with bonded interfaces.

Stoiko Ivanov · Nov 7, 2019

Hmm - tricky one ...

Things you could try to rule out further components:
* remove 'bond0'
* create 'vmbr2' with 'eno2' as port
* boot a guest with a VLAN-tagged interface - does it work?
* same with 'eno3' instead of 'eno2'

if one of the tries does not work - it might indicate a problem with one of the bond-ports (eno2, eno3) - or their corresponding switchports

if both bond members work fine individually:
* check `dmesg` after a fresh boot - are there any messages indicating any bond/network related problems?
* check the switch logs and state for that bond

as a last (but quite often very effective) resort I would start tcpdump on various points to see where the packets (or the responses) get lost:
* `tcpdump -envi vmbr2`
* `tcpdump -envi bond0`
* `tcpdump -envi eno2`
* `tcpdump -envi eno3`
* `tcpdump -envi tap<VMID>i<IFACEID>` (the tap-device on the host for the vm with id VMID and interface number IFACEID)

That should give you a clearer picture.

I hope this helps!

Search

Search

VLAN's Stopped Working in 6.0-11 update

Celestialdeath99

Member

t.lamprecht

Proxmox Staff Member

Celestialdeath99

Member

t.lamprecht

Proxmox Staff Member

Celestialdeath99

Member

Celestialdeath99

Member

Stoiko Ivanov

Proxmox Staff Member

We value your privacy