Hi,
I’m not sure how, but I seem to have damaged the networking on my cluster.
Workstation in VLAN 60 -> New LXC in 192.168.50.0/24
Old LXC Configuration
New LXC Configuration
I'm not sure how to start fixing this. Since it happens at the datacenter level, my next step would be to rebuild the entire cluster—but I'd really prefer to find another solution. If I'm missing anything, like specific logs, please let me know what you need.
Thanks.
I’m not sure how, but I seem to have damaged the networking on my cluster.
Network Overview
- 192.168.50.0/24: LAN (VLAN 50)
- 192.168.60.0/24: WiFi (VLAN 60)
- Prox01: 192.168.50.11
- Prox02: 192.168.50.12
- Pi Zero 2 (Quorum device): 192.168.50.13
- Main workstation: 192.168.50.50
- Runs on Prox01 with 2 passthrough NICs
- 192.168.50.1
- 192.168.60.1
- Old LXC: 192.168.50.130
- New Test LXC1: 192.168.50.151 & 192.168.60.151
- New Test LXC2: 192.168.50.152 & 192.168.60.152
The Problem
I rebooted the server on Sunday morning, and since then, I cannot connect to any newly created LXC on the 192.168.50.0/24 network via any port.Examples:
Workstation -> Old LXC- 192.168.50.50 pinging 192.168.50.130: Works
- 192.168.50.50 SSH to 192.168.50.130: Works
- 192.168.50.50 HTTP to 192.168.50.130: Works
- 192.168.50.50 pinging 192.168.50.170: Works
- 192.168.50.50 SSH to 192.168.50.170: Works
- 192.168.50.50 HTTP to 192.168.50.170: Works
- 192.168.50.50 pinging 192.168.50.151: Works
- 192.168.50.50 SSH to 192.168.50.151: Timeout
- 192.168.50.50 HTTP to 192.168.50.151: Timeout
- 192.168.50.50 pinging 192.168.60.151: Works
- 192.168.50.50 SSH to 192.168.60.151: Works
- 192.168.50.50 HTTP to 192.168.60.151: Works
- 192.168.50.50 pinging 192.168.50.152: Works
- 192.168.50.50 SSH to 192.168.50.152: Timeout
- 192.168.50.50 HTTP to 192.168.50.152: Timeout
- 192.168.50.50 pinging 192.168.60.152: Works
- 192.168.50.50 SSH to 192.168.60.152: Works
- 192.168.50.50 HTTP to 192.168.60.152: Works
Strange Behavior
If I move my workstation to VLAN 60 (WiFi):Workstation in VLAN 60 -> New LXC in 192.168.50.0/24
- 192.168.60.50 pinging 192.168.50.151: Works
- 192.168.60.50 SSH to 192.168.50.151: Works
- 192.168.60.50 HTTP to 192.168.50.151: Works
Additional Notes
- The issue persists for LXCs created on both nodes (Prox01 and Prox02).
- Prox01 has a free NIC, but using it for the LXC yields the same results.
- There are no SSH log entries on the affected LXCs when the connection times out.
- Restoring an LXC from backup works fine.
- Creating a new VM also works without issues.
- Firewalls on the workstation, LXC, Proxmox, and Datacenter are disabled.
- No firewall rules on the opnsense.
- Webconsole over Proxmox still works.
- I have no problems reaching other webpages that are not hosted on Proxmox in 192.168.50.0/24
Proxmox Version
Code:
pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.12-7-pve)
/etc/network/interfaces
Code:
auto lo
iface lo inet loopback
iface ethlan0 inet manual
auto vmbr0
iface vmbr0 inet manual
bridge-ports ethlan0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4000
#Motherboard NIC
auto vmbr0.50
iface vmbr0.50 inet static
address 192.168.50.11/24
gateway 192.168.50.1
auto vmbr0.60
iface vmbr0.60 inet static
address 192.168.60.11/24
auto vmbr0.70
iface vmbr0.70 inet static
address 192.168.70.11/24
iface ethlan1 inet manual
iface ethlan2 inet manual
iface ethlan3 inet manual
source /etc/network/interfaces.d/*
Old LXC Configuration
Code:
arch: amd64
cores: 2
features: nesting=1
hostname: Homepage
memory: 512
net0: name=eth0,bridge=vmbr0,firewall=0,gw=192.168.50.1,hwaddr=E6:B2:DD:7F:68:23,ip=192.168.50.130/24,tag=50,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-lvm:vm-130-disk-0,size=8G
startup: order=3
swap: 1024
tags: Docker
unprivileged: 1
lxc.idmap: u 0 100000 1000
lxc.idmap: u 1000 1000 1
lxc.idmap: u 1001 101001 64534
lxc.idmap: g 0 100000 1000
lxc.idmap: g 1000 1000 1
lxc.idmap: g 1001 101001 64534
New LXC Configuration
Code:
arch: amd64
cores: 1
features: nesting=1
hostname: Test1
memory: 512
net1: name=net1,bridge=vmbr0,firewall=0,gw=192.168.50.1,hwaddr=BC:24:11:BC:15:C2,ip=192.168.50.150/32,tag=50,type=veth
ostype: ubuntu
rootfs: Prox01-Local:150/vm-150-disk-0.raw,size=8G
swap: 512
unprivileged: 1
lxc.idmap: u 0 100000 1000
lxc.idmap: u 1000 1000 1
lxc.idmap: u 1001 101001 64534
lxc.idmap: g 0 100000 1000
lxc.idmap: g 1000 1000 1
lxc.idmap: g 1001 101001 64534
I'm not sure how to start fixing this. Since it happens at the datacenter level, my next step would be to rebuild the entire cluster—but I'd really prefer to find another solution. If I'm missing anything, like specific logs, please let me know what you need.
Thanks.