Network issues with new LXC containers in single VLAN

K7kZJKovXupagy5Y

New Member
Mar 9, 2023
2
1
1
Hi,
I’m not sure how, but I seem to have damaged the networking on my cluster.

Network Overview​

  • 192.168.50.0/24: LAN (VLAN 50)
  • 192.168.60.0/24: WiFi (VLAN 60)
Devices:
  • Prox01: 192.168.50.11
  • Prox02: 192.168.50.12
  • Pi Zero 2 (Quorum device): 192.168.50.13
  • Main workstation: 192.168.50.50
OPNsense VM:
  • Runs on Prox01 with 2 passthrough NICs
    • 192.168.50.1
    • 192.168.60.1
LXCs:
  • Old LXC: 192.168.50.130
  • New Test LXC1: 192.168.50.151 & 192.168.60.151
  • New Test LXC2: 192.168.50.152 & 192.168.60.152

The Problem​

I rebooted the server on Sunday morning, and since then, I cannot connect to any newly created LXC on the 192.168.50.0/24 network via any port.

Examples:​

Workstation -> Old LXC
  • 192.168.50.50 pinging 192.168.50.130: Works
  • 192.168.50.50 SSH to 192.168.50.130: Works
  • 192.168.50.50 HTTP to 192.168.50.130: Works
Workstation -> Old LXC with a newly created NIC
  • 192.168.50.50 pinging 192.168.50.170: Works
  • 192.168.50.50 SSH to 192.168.50.170: Works
  • 192.168.50.50 HTTP to 192.168.50.170: Works
Workstation -> New LXC in 192.168.50.0/24
  • 192.168.50.50 pinging 192.168.50.151: Works
  • 192.168.50.50 SSH to 192.168.50.151: Timeout
  • 192.168.50.50 HTTP to 192.168.50.151: Timeout
Workstation -> New LXC in 192.168.60.0/24
  • 192.168.50.50 pinging 192.168.60.151: Works
  • 192.168.50.50 SSH to 192.168.60.151: Works
  • 192.168.50.50 HTTP to 192.168.60.151: Works
New Test LXC1 -> New Test LXC2 (Both in 192.168.50.0/24)
  • 192.168.50.50 pinging 192.168.50.152: Works
  • 192.168.50.50 SSH to 192.168.50.152: Timeout
  • 192.168.50.50 HTTP to 192.168.50.152: Timeout
New Test LXC1 -> New Test LXC2 (Both in 192.168.60.0/24)
  • 192.168.50.50 pinging 192.168.60.152: Works
  • 192.168.50.50 SSH to 192.168.60.152: Works
  • 192.168.50.50 HTTP to 192.168.60.152: Works

Strange Behavior​

If I move my workstation to VLAN 60 (WiFi):

Workstation in VLAN 60 -> New LXC in 192.168.50.0/24
  • 192.168.60.50 pinging 192.168.50.151: Works
  • 192.168.60.50 SSH to 192.168.50.151: Works
  • 192.168.60.50 HTTP to 192.168.50.151: Works

Additional Notes​

  1. The issue persists for LXCs created on both nodes (Prox01 and Prox02).
  2. Prox01 has a free NIC, but using it for the LXC yields the same results.
  3. There are no SSH log entries on the affected LXCs when the connection times out.
  4. Restoring an LXC from backup works fine.
  5. Creating a new VM also works without issues.
  6. Firewalls on the workstation, LXC, Proxmox, and Datacenter are disabled.
  7. No firewall rules on the opnsense.
  8. Webconsole over Proxmox still works.
  9. I have no problems reaching other webpages that are not hosted on Proxmox in 192.168.50.0/24


Proxmox Version

Code:
pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.12-7-pve)

/etc/network/interfaces​

Code:
auto lo
iface lo inet loopback


iface ethlan0 inet manual


auto vmbr0
iface vmbr0 inet manual
    bridge-ports ethlan0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4000
#Motherboard NIC


auto vmbr0.50
iface vmbr0.50 inet static
    address 192.168.50.11/24
    gateway 192.168.50.1


auto vmbr0.60
iface vmbr0.60 inet static
    address 192.168.60.11/24


auto vmbr0.70
iface vmbr0.70 inet static
    address 192.168.70.11/24


iface ethlan1 inet manual


iface ethlan2 inet manual


iface ethlan3 inet manual


source /etc/network/interfaces.d/*

Old LXC Configuration
Code:
arch: amd64
cores: 2
features: nesting=1
hostname: Homepage
memory: 512
net0: name=eth0,bridge=vmbr0,firewall=0,gw=192.168.50.1,hwaddr=E6:B2:DD:7F:68:23,ip=192.168.50.130/24,tag=50,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-lvm:vm-130-disk-0,size=8G
startup: order=3
swap: 1024
tags: Docker
unprivileged: 1
lxc.idmap: u 0 100000 1000
lxc.idmap: u 1000 1000 1
lxc.idmap: u 1001 101001 64534
lxc.idmap: g 0 100000 1000
lxc.idmap: g 1000 1000 1
lxc.idmap: g 1001 101001 64534

New LXC Configuration
Code:
arch: amd64
cores: 1
features: nesting=1
hostname: Test1
memory: 512
net1: name=net1,bridge=vmbr0,firewall=0,gw=192.168.50.1,hwaddr=BC:24:11:BC:15:C2,ip=192.168.50.150/32,tag=50,type=veth
ostype: ubuntu
rootfs: Prox01-Local:150/vm-150-disk-0.raw,size=8G
swap: 512
unprivileged: 1
lxc.idmap: u 0 100000 1000
lxc.idmap: u 1000 1000 1
lxc.idmap: u 1001 101001 64534
lxc.idmap: g 0 100000 1000
lxc.idmap: g 1000 1000 1
lxc.idmap: g 1001 101001 64534

I'm not sure how to start fixing this. Since it happens at the datacenter level, my next step would be to rebuild the entire cluster—but I'd really prefer to find another solution. If I'm missing anything, like specific logs, please let me know what you need.

Thanks. :)
 
I don't even know what to say. I guess my brain wasn’t working today.
I spent 4 hours troubleshooting, and 5 minutes after posting, I saw it.
I had been creating the LXC with a /32 NIC address the whole time. Truly a big-brain moment.

The problem is solved.
 
  • Like
Reactions: UdoB
I spent 4 hours troubleshooting, and 5 minutes after posting, I saw it.
This happens to all of us :-)


Please edit the title to tag it "Solved" - there is an "Edit Thread" button at the top right.