I've been at this for a week, and I can't figure it out. Sorry for the Information dump, but I'm hoping that someone can take a look at the below to both try and solve my problem, and also offer a sanity check that I'm properly using VLANS and routing in my network and within Proxmox itself.
The problem:
When accessing servers on different vlans, connections are dropped after 10-40 seconds.
For Example:On my Desktop on Default Vlan 1, IP 10.10.1.103 I SSH into the DNS server with IP: 10.10.30.100 (It's interface for the private VLAN)After 10-40 secondsish, the connection is dropped and frozen. I am able to make a new one in another ssh session window.
During that time, pings have been continuous without fail to the same IP.
If I ssh into the same server on the SAME subnet via its Default/Lan IP 10.10.1.100, this does not happen, and the connection is fine.This does not only apply to SSH, but also web connections as well. For Exmaple, Nextcloud will drop with "Connection to server lost" often, and the page has to be refreshed. (Nextcloud being on VLAN 30, accessed via reverse Proxy that is on VLAN 20)
This applies to both servers with a single NIC, and more than 1 NIC for several VLANs tagged within proxmox.
Preface:
Topology:
The Switch:
What I've tried:
The problem:
When accessing servers on different vlans, connections are dropped after 10-40 seconds.
For Example:On my Desktop on Default Vlan 1, IP 10.10.1.103 I SSH into the DNS server with IP: 10.10.30.100 (It's interface for the private VLAN)After 10-40 secondsish, the connection is dropped and frozen. I am able to make a new one in another ssh session window.
During that time, pings have been continuous without fail to the same IP.
If I ssh into the same server on the SAME subnet via its Default/Lan IP 10.10.1.100, this does not happen, and the connection is fine.This does not only apply to SSH, but also web connections as well. For Exmaple, Nextcloud will drop with "Connection to server lost" often, and the page has to be refreshed. (Nextcloud being on VLAN 30, accessed via reverse Proxy that is on VLAN 20)
This applies to both servers with a single NIC, and more than 1 NIC for several VLANs tagged within proxmox.
Preface:
- All VLAN interfaces are open to each other via any any on the pfsenseVM until this issue is fixed with allow all/all
- All Proxmox Hypervisors also have interfaces on each vlan to manage them until fixed.
- All the individual VMS have their network interfaces marked with the VLAN needed within proxmox, except for pfsense, which has the bridge as a network interface and handles it's VLANS within.
- No firewalls are enabled in proxmox.
- All Servers have a single NIC.
Topology:
Pfsense is running on Server1 with two virtual NICS, both vmbr0 from proxmox with no VLAN tagging.Within pfsense:NIC 1 has 1 VLAN, 777 and is set to the WAN interface.NIC 2 has 4 interfaces: nic2(for default/lan), VLAN10, VLAN20, VLAN30.Default LAN:
10.10.1.1/24
Three VLANS:
10-Management (10.10.10.0/24)
20-Public (10.10.20.0/24)
30-Private (10.10.30.0/24)
Individual Configuration as follows:
Server1:
vmbr0 - IP: 10.10.1.11/24 Gateway: 10.10.1.1
vmbr.10 - IP: 10.10.10.11/24
vmbr.20 - IP: 10.10.20.11/24
vmbr.30 - IP: 10.10.30.11/24
Server2:
vmbr0 - IP: 10.10.1.12/24 Gateway: 10.10.1.1
vmbr.10 - IP: 10.10.10.12/24
vmbr.20 - IP: 10.10.20.12/24
vmbr.30 - IP: 10.10.30.12/24
Server3:
vmbr0 - IP: 10.10.1.13/24 Gateway: 10.10.1.1
vmbr.10 - IP: 10.10.10.13/24
vmbr.20 - IP: 10.10.20.13/24
vmbr.30 - IP: 10.10.30.13/24
The Switch:
Port 1: PVID: 777, untagged 777 - > MODEM WAN
Port 2: Tagged 10, 20, 30, 777 / Untagged 1 - > Server1
Port 3: Tagged 10, 20, 30 / Untagged 1 - > Server2
Port 4: Tagged 10, 20, 30 / Untagged 1 - > Server3
Port 4: Untagged 1 - > Desktop
What I've tried:
- Set allow all/all as a floating rule on the firewall via pfsense.
- Increased the state table count.
- System >> Advanced >> Firewall & NAT >>Bypass firewall rules for traffic on the same interface
- System -> Advanced -> Miscellaneous -> Gateway Monitoring -> (State Killing on Gateway Failure On) Not checked: Skip rules when gateway is down
- I thought it could be a Asymmetric Routing issue, so I disabled all but 1 NIC within the DNS server and tried to ssh with another interface other than the Default, but the issue kept happening.
- Using a router with all interfaces as tagged vlans running openwrt solved this issue, but I want to use PFSense within the VM, and get rid out the router.