[SOLVED] As soon as all containers and VM's shut down, Proxmox loses all internet connection, including to the router. Is this a bug?

reckless

Well-Known Member
Feb 5, 2019
79
4
48
The scenario is as folllows: I have one single container running in total (LXC ID 117), running on my Proxmox host. It runs perfectly fine, both Proxmox host as well as container have full dual-stack internet. Everything has been working perfectly fine for months on end.
But now, I shut down the one container, meaning that Proxmox is running 0 containers and 0 VM's. I immediately lose SSH access and the Proxmox host loses all internet connectivity.

Here's my /etc/network/interfaces:
Code:
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto enp67s0
iface enp67s0 inet manual
    mtu 9216
#10G Mellanox Port
auto enp67s0d1
iface enp67s0d1 inet manual
    mtu 9216
#10G Mellanox Port2
auto enp1s0f0np0
iface enp1s0f0np0 inet manual
    mtu 9216
#100G port1
auto enp1s0f1np1
iface enp1s0f1np1 inet manual
    mtu 9216
#100G port 2
auto bond0
iface bond0 inet manual
    bond-slaves enp67s0 enp67s0d1
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3
    mtu 9216
auto vmbr0
iface vmbr0 inet static
    address 192.168.3.3/24
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
auto vmbr2
iface vmbr2 inet static
    address 192.168.1.2/24
    gateway 192.168.1.1
    bridge-ports enp1s0f0np0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-400, 4040
    mtu 9216
iface vmbr2 inet6 static
    address 2602:1111:2222:3300::abcd/64
    gateway fe80::34ea:d7ff:fe16:9ede
    up echo 0 > /sys/class/net/vmbr2/bridge/multicast_router
    up echo 0 > /sys/class/net/vmbr2/bridge/multicast_snooping
iface vmbr1 inet manual
    bridge-ports eno2
    bridge-stp off
    bridge-fd 0
auto vmbr5
iface vmbr5 inet static
    address 192.168.5.11/24
    bridge-ports enp1s0f1np1
    bridge-stp off
    bridge-fd 0
    mtu 9216

And this is what happens according to dmesg -w the moment I shutdown the last running container (pct stop 117):

Code:
[ 3725.131201] fwbr117i0: port 2(veth117i0) entered disabled state
[ 3725.133470] device veth117i0 left promiscuous mode
[ 3725.133474] fwbr117i0: port 2(veth117i0) entered disabled state
[ 3725.375655] kauditd_printk_skb: 8 callbacks suppressed
[ 3725.375658] audit: type=1400 audit(1686625689.959:2674): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-117_</var/lib/lxc>" pid=248884 comm="apparmor_parser"
[ 3726.353216] fwbr117i0: port 1(fwln117i0) entered disabled state
[ 3726.353617] vmbr2: port 2(fwpr117p0) entered disabled state
[ 3726.355144] device fwln117i0 left promiscuous mode
[ 3726.355146] fwbr117i0: port 1(fwln117i0) entered disabled state
[ 3726.399856] device fwpr117p0 left promiscuous mode
[ 3726.399861] vmbr2: port 2(fwpr117p0) entered disabled state
[ 3726.494405] device enp1s0f0np0 left promiscuous mode

And poof, no more internet connection. Nothing works, not even a local ping to the router!
I now restart the container (pct start 117), and suddenly everything is back to normal again. dmesg -w shows this once I start the container:

Code:
[ 3768.232666] audit: type=1400 audit(1686625732.812:2675): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-117_</var/lib/lxc>" pid=249462 comm="apparmor_parser"
[ 3768.754816] vmbr2: port 2(fwpr117p0) entered blocking state
[ 3768.754821] vmbr2: port 2(fwpr117p0) entered disabled state
[ 3768.755073] device fwpr117p0 entered promiscuous mode
[ 3768.755154] device enp1s0f0np0 entered promiscuous mode
[ 3768.755421] vmbr2: port 2(fwpr117p0) entered blocking state
[ 3768.755423] vmbr2: port 2(fwpr117p0) entered forwarding state
[ 3768.841388] fwbr117i0: port 1(fwln117i0) entered blocking state
[ 3768.841394] fwbr117i0: port 1(fwln117i0) entered disabled state
[ 3768.841542] device fwln117i0 entered promiscuous mode
[ 3768.841662] fwbr117i0: port 1(fwln117i0) entered blocking state
[ 3768.841664] fwbr117i0: port 1(fwln117i0) entered forwarding state
[ 3768.851512] fwbr117i0: port 2(veth117i0) entered blocking state
[ 3768.851516] fwbr117i0: port 2(veth117i0) entered disabled state
[ 3768.851706] device veth117i0 entered promiscuous mode
[ 3768.891878] eth0: renamed from vethmAd8rR
[ 3769.198167] audit: type=1400 audit(1686625733.780:2676): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-117_</var/lib/lxc>" name="/sys/kernel/config/" pid=249681 comm="mount" fstype="configfs" srcname="configfs" flags="rw, nosuid, nodev, noexec"
[ 3769.198174] audit: type=1400 audit(1686625733.780:2677): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-117_</var/lib/lxc>" name="/sys/kernel/config/" pid=249681 comm="mount" fstype="configfs" srcname="configfs" flags="ro, nosuid, nodev, noexec"
[ 3769.218990] audit: type=1400 audit(1686625733.800:2678): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-117_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=249687 comm="(networkd)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
[ 3769.223726] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 3769.223873] fwbr117i0: port 2(veth117i0) entered blocking state
[ 3769.223876] fwbr117i0: port 2(veth117i0) entered forwarding state
[ 3769.228602] audit: type=1400 audit(1686625733.808:2679): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-117_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=249707 comm="(networkd)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
[ 3769.237517] audit: type=1400 audit(1686625733.816:2680): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-117_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=249723 comm="(networkd)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
[ 3769.246755] audit: type=1400 audit(1686625733.828:2681): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-117_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=249731 comm="(networkd)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
[ 3769.262484] audit: type=1400 audit(1686625733.844:2682): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-117_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=249756 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
[ 3769.263848] audit: type=1400 audit(1686625733.844:2683): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-117_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=249759 comm="(networkd)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
[ 3769.275962] audit: type=1400 audit(1686625733.856:2684): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-117_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=249773 comm="(d-logind)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"

For context, the vmbr2 connection (192.168.1.2) is the Proxmox host IP and where I SSH to. The NIC for this is a 100G port from the Mellanox ConnectX-6 Dx dual 100G card, which otherwise works just fine.
I've already tried systemctl restart networking.service and ifreload -a, but both did not fix the issue when all containers are down.

What is going on here? I cannot upgrade from Proxmox 7 to 8 because I cannot wind down all containers and VMs, because as soon as I do, I lose all internet connectivity... how do I solve this?
 
Last edited:
Seeming as no one else has responded, I will hazard a guess.

I do not think that VLAN aware bridges should not have IP addresses. Because, which VLAN is the IP address getting attached to? I am guessing that the the IP address is getting attached to the first port (a VM or Container) that is attached to the bridge? But like I said, totally guessing here.

I would suggest removing the IP address from vmbr2 and adding a VLAN interface to vmbr2 with the IP address instead.

In the GUI System->Network->Create->Linux VLAN or you can add something along the lines of the following in your interfaces file (setting the vlan-id as appropriate):

Code:
auto Management
iface Management inet static
    address 192.168.1.2/24
    gateway 192.168.1.1
    mtu 9126
    vlan-id ??
    vlan-raw-device vmbr2
#PVE Management Interface

This will create an interface called "Management". This is how my systems are set up.
 
Management interface should always be on a dedicated network that VMs / Containers don't reside on. That way if you ever bork your network interfaces you won't lose connection to the PVE host. This is also true for any cluster environment.

Depending on the server I usually pick the onboard NIC as it doesn't have to be anything fancy.
 
  • Like
Reactions: reckless
I solved this using the above advice, and creating a separate subnet with a dedicated LAN port. This allowed me to connect to Proxmox while all containers were down.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!