Alright, been struggling with this for a while.
I have an established environment with two VMs and 6 or so CTs spanning 4 VLANs. The mgmt interface is stand alone and works fine. I have a 4 port Intel NIC with all 4 ports in a bond and a bridge assigned on top of that. It has no IP. It is VLAN aware.
All the workloads that I have setup now work great. The problem comes when I create a new CT workload. It is OS independent. No matter what VLAN tag I assign it, I'll have networking for a while, then it's like the guest just falls off the network. I'll still have outbound connectivity if I pop a console in the guest, but I can't talk to the guest from any other host, including my firewall where all the routing happens. In fact, the ARP entry will even disappear after some time, as expected when ARP expire. Randomly it will come back and drop. No errors are logged on guest or host. The weird part is that it's only a problem for new workloads.
I have the 4 ports bonded with 802.3ad, LACP fast, and layer2+3 hashing. On the other end is a microtik CSS326, which appears to be just fine with the LAGG.
PCAPs from the firewall and the guest OS don't show me anything. There aren't any strange errors or a flood of TCP retransmissions or anything. It's just there, then it's not. Most of the time when it comes back up, my SSH sessions will persist and I'll be back to the races for another couple minutes before it drops again.
I'm at wit's end. I appreciate any help!
I have an established environment with two VMs and 6 or so CTs spanning 4 VLANs. The mgmt interface is stand alone and works fine. I have a 4 port Intel NIC with all 4 ports in a bond and a bridge assigned on top of that. It has no IP. It is VLAN aware.
All the workloads that I have setup now work great. The problem comes when I create a new CT workload. It is OS independent. No matter what VLAN tag I assign it, I'll have networking for a while, then it's like the guest just falls off the network. I'll still have outbound connectivity if I pop a console in the guest, but I can't talk to the guest from any other host, including my firewall where all the routing happens. In fact, the ARP entry will even disappear after some time, as expected when ARP expire. Randomly it will come back and drop. No errors are logged on guest or host. The weird part is that it's only a problem for new workloads.
I have the 4 ports bonded with 802.3ad, LACP fast, and layer2+3 hashing. On the other end is a microtik CSS326, which appears to be just fine with the LAGG.
PCAPs from the firewall and the guest OS don't show me anything. There aren't any strange errors or a flood of TCP retransmissions or anything. It's just there, then it's not. Most of the time when it comes back up, my SSH sessions will persist and I'll be back to the races for another couple minutes before it drops again.
I'm at wit's end. I appreciate any help!
Code:
proxmox-ve: 7.1-1 (running kernel: 5.15.30-1-pve)
pve-manager: 7.1-12 (running version: 7.1-12/b3c09de3)
pve-kernel-5.15: 7.1-14
pve-kernel-helper: 7.1-14
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.30-1-pve: 5.15.30-1
pve-kernel-5.15.27-1-pve: 5.15.27-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-7
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-5
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-6
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.2.0-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1