Simple 2 node cluster - weird problem

exebat

New Member
Sep 15, 2024
8
0
1
Hi all,

I have two independent servers on Hetzner with PVE 8.2.4 installed on both of them. I decided to create a cluster just for management and offline VM migration so i created a simple vSwitch with VLAN 4000

Node A
-----------------
auto enp7s0.4000
iface enp7s0.4000 inet manual
mtu 1400

auto vmbr3
iface vmbr3 inet static
address 192.168.200.1/26
bridge-ports enp7s0.4000
bridge-stp off
bridge-fd 0
mtu 1400


Node B
-----------------
auto enp5s0.4000
iface enp5s0.4000 inet manual
mtu 1400


auto vmbr4000
iface vmbr4000 inet static
address 192.168.200.2/26
bridge-ports enp5s0.4000
bridge-stp off
bridge-fd 0
mtu 1400


I can ping A > B and B > A

I can execute "curl https://192.168.200.2:8006 -vI" on A and get B results but I cant execute curl "https://192.168.200.1:8006 -vI" on B for A - I am getting a timed out. I can SSH from A > B but not from B > A


Web interface is working on A (tested form Windows VM from 192.168.200.10 to 192.168.200.1:8006 and also from public IP) and is listening on all addresses.


I tried changing the IP address to 192.168.200.1/32, 192.168.200.1/16, 192.168.200.1/24, 192.168.200.1/18 but it still doesn't work. I tried recreating vSwitch, disabling IPv6, restarting the server, disabled both firewalls...


What might be the problem ?
 
Last edited:
So I added the third server C


A, B and C can all ping each other

A can SSH to B + C

B and C can SSH each other but neither can SSH A on VLAN IP...

B and C can SSH A on public IP



Don't know what to try next
 
Where is the gateway in your config. How are they connected to each other and the internet?

They communicate just with each other over Hetzner vSwitch with Vlan tag. On server A I have VM with OPNSense that is routing internet for other VMs and that is working OK.

The only problem is that something is blocking services on server A just on that interface but not on other interfaces but outgoing traffic for the same services on the same interface is working just fine.
 
Well then the obvious questions, do you have a firewall enabled? Does Hetzner implement a firewall? You said they can ping each other, given that is true, they can communicate with each other. I'm assuming this is all over dedicated interfaces, enp5s0 and enp7s0 are the correct interfaces and are dedicated to this VLAN?
 
Well then the obvious questions, do you have a firewall enabled? Does Hetzner implement a firewall? You said they can ping each other, given that is true, they can communicate with each other. I'm assuming this is all over dedicated interfaces, enp5s0 and enp7s0 are the correct interfaces and are dedicated to this VLAN?

Firewall turned off on all servers. Tried also with FW turned on and rules setup - same results.

I also reinstalled the B and C server. Still only pings A but A connects to all of them (ssh) and to self. Really weird.

SSH to A from B and C times out on Vlan interface but instantly connects on public interface. Proxmox Web panel on VlanIP:8006 is also unavailable on A but i can open B & C web panel from A.

Same network settings on all servers, just different Vlan IPs .1, .2, .3
 
So B and C can talk to each other but B and C cannot talk to A, but ping works on the IP address, however A can connect to B and C.

So a potential firewall rule would be:
ALLOW: "established,related" (open connection) packets
ALLOW: ICMP
DENY: Incoming on interface/IP A

Are you sure SSH/web server is actually active/responding on that IP address (netstat -antp)
Are you sure you are not accidentally passing the interface to your OpnSense - how does traffic from either the VMs or B and C reach OpnSense to be routed to the Internet? I'm assuming you have it on its own IP address (range) on another bridge?
 
So B and C can talk to each other but B and C cannot talk to A, but ping works on the IP address, however A can connect to B and C.

Exactly.

I set firewall to NO on Node and Datacenter level on all servers.

netstat shows the services are running

1726493412827.png


Here is the current network configuration of the interfaces on server A. I tried whatever combo I found on the internet including deleting all and using just enp7s0 and vlan.

1726493526571.png

OpnSense is using vmbr 0 for WAN and vmbr1 for LAN to other VMs. Below is the interfaces file

Code:
source /etc/network/interfaces.d/*

auto lo
iface lo inet loopback

iface lo inet6 loopback

auto enp7s0
iface enp7s0 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves enp7s0
        bond-miimon 100
        bond-mode active-backup
auto bond0.4000
iface bond0.4000 inet manual
         mtu 1400

auto vmbr0
iface vmbr0 inet static
        address xxx.xxx.xxx.61/26
        gateway xxx.xxx.xxx.1
        bridge-ports bond0
        bridge-stp off
        bridge-fd 1
        bridge-vlan-aware yes
        bridge-vids 2-4094
        bridge-hw enp7s0
        pointopoint xxx.xxx.xxx.1

auto vmbr1
iface vmbr1 inet manual
        bridge-ports none
        bridge-stp off
        bridge-fd 0

auto vmbr2
iface vmbr2 inet static
        address 192.168.2.1/24
        bridge-ports bond0.4000
        bridge-stp off
        bridge-fd 0
        mtu 1400
 
How is OpnSense using VMBR0 for WAN? Does it have its own IP address? Is it passed through? Because by default OpnSense is promiscuous if you have certain software like the Suricata enabled which you would potentially have that IP stack responding to packets.

Right now you have everything going over 1 physical interface, also you have vmbr0 encompassing the same VLAN as bond0
 
How is OpnSense using VMBR0 for WAN? Does it have its own IP address? Is it passed through? Because by default OpnSense is promiscuous if you have certain software like the Suricata enabled which you would potentially have that IP stack responding to packets.

Right now you have everything going over 1 physical interface, also you have vmbr0 encompassing the same VLAN as bond0

OpnSense has its own IP address that it is getting via separate MAC address. A didn't mention that I did all the testing with all the other VMs (including OpnSense) disabled/shutdown so I don't think OpnSense is the issue
 
You still have the issue of the same VLAN on two interfaces, so when a tagged packet comes in, which bridge should it go to?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!