ESXi to Proxmox - Network issue - ProxMox Bridge prohibit communication

Urbanovits · Sep 13, 2024

Hi,

I found an interesting symptom.

Have a 3 node cluster (upgraded 7.x to 8.2.x latest) and one of them (let1s call A host) hosting my ESXi host.
ANY ProxMox VMs on host A unable to ping (or communiicate) ESXi VM on host A, but the rest B and C node VMs can. Moving forward All of my A or ANY host ProxMox VMs able to ping and communicate all VMs hosted by ESXi

A host any VM <=> A host ESXi failed
A host any VM <=> [A host ESXi <=> hosted VMs] succeed

It seems to be a problem around the A host bridge (or let'say PMX/Debian bridge), which prevent this communication

ESXi have HOST cpu and vmxnet3 lan it cannot be root cause of this problem

Is it MAC address flooding prevention or what? Any idea to reveal?

Thanks

dakralex · Sep 17, 2024

I could not reproduce the issue you described with an ESXi 8.0 host running as a VM on a Proxmox VE 8.2.4 node and trying to:

- ping a ESXi VM from a VM running on the same node,
- ping a ESXi VM from a VM running on a different node in the cluster,
- ping a PVE VM on the same node from a ESXi VM, and
- ping a PVE VM on a different node in the cluster node from a ESXi VM;

How have you configured the network for the VMs running on the same PVE node? What is the local network configuration on node A (post the output of /etc/network/interfaces and which bridges are used for the ESXi host and the VM that cannot reach the ESXi VMs)?

Urbanovits · Sep 17, 2024

Hi,

ESX version too old running as embedded (ESXi 6.0 Update 2, build ) No option to upgrade

My results:
- ping a ESXi VM from a VM running on the same node, **FAILED**
- ping a ESXi VM from a VM running on a different node in the cluster, **OK**
- ping a PVE VM on the same node from a ESXi VM, and **FAILED**
- ping a PVE VM on a different node in the cluster node from a ESXi VM **OK**

ESXi

Node

cat /etc/network/interfaces

auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

iface eno2 inet manual

iface eno3 inet manual

iface eno4 inet manual

auto eno49
iface eno49 inet manual
mtu 9000
#10G LAN

auto eno50
iface eno50 inet manual
mtu 9000
#10G LAN

iface ens3f0 inet manual

iface ens3f1 inet manual

iface ens3f2 inet manual

iface ens3f3 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.100.181/24
gateway 192.168.100.253
bridge-ports eno1
bridge-stp off
bridge-fd 0

auto vmbr200
iface vmbr200 inet static
address 192.168.200.181/24
bridge-ports eno49
bridge-stp off
bridge-fd 0
mtu 9000
#10G LAN vlan200

dakralex · Sep 18, 2024

You haven't sent me information about the network interface that is configured on a PVE VM that cannot reach ESXi VMs. But as far as I can see, that since the ESXi VMs can be pinged from the outside (that is, your cluster), the ICMP packets can be routed to the ESXi/PVE VMs fine, but cannot be routed between each other.

This leads me to believe that the ESXi host VM is configured to use the vmbr0 bridge (from your screenshots) and the other VMs on the PVE node are configured to use the vmbr200 bridge. Since those use different subnets (192.168.100.181/24 and 192.168.200.181/24 respectively), they cannot communicate with each other. If I'm correct, you'd need to either widen the subnet or add a route between the bridges.

Urbanovits · Sep 18, 2024

It bothers me a little that you think about me so silly.

All VMs and PMX host sits on VMBR0 an all hosts. VMBR200 is a storage net, not used yet on that hosts and VMs
BTW VMBR200 and VMBR0 fully isolated on switch level. So VMBR0 is the main link on all hosts and VMs yet
Nothing to do with routing

Where ping failed on "Host A"
ESXi => vmbr0
PMX host => vmbr0
Other VMs on that host A => vmbr0
ESXi hosted VMs => ESXi Switch => vmbr0

Where ping and all communication succeed on Host B to whatever X
PMX host other => vmbr0
Other VMs on that NON A host => vmbr0
Other VMs on that NON A to non-working A host hosted VMs on ESXi => vmbr0

Summ all: this is the problematic part ESXi Switch <=> vmbr0 on host A
ESXi swicht cannot be root cause, because I can ping all behind from non A host

So I ended up Host A switch aka VMBR0

dakralex · Sep 18, 2024

I'm sorry that it came across that way, I'm just trying to find a solution to your problem and I haven't fully understood your configuration yet, which lead me to a false conclusion. As far as I understand now, you cannot ping the ESXi host and VMs on the ESXi host from another VM running on the same PVE node, but it works from other cluster nodes.

I still have some questions for you:

Do you have a firewall setup and if so which? Could there be any rules that prohibit traffic between VMs on the same bridge? If you have setup a PVE firewall, it would be helpful, if you could post anonymized outputs of nft list ruleset or iptables-save -c (depending if you use nftables or iptables) and the contents /etc/pve/firewall/cluster.fw and /etc/pve/local/host.fw.
What is the output of tcpdump -envi vmbr0 arp or icmp on Host A, while you're trying to ping between VMs on Host A?
What are the local network configurations on the VMs themselves?

Urbanovits · Sep 18, 2024

No firewall applied, it is a test system

Problem on ARP level OSI layer2

PMX host bridge preventing ARP broadcast traffic form local VMs and itself to locally hosted ESXi VM only.
ESXi host

* 192.168.100.14 ip address another VM on host A => incomplete
* 192.168.100.1 VM hosted by ESXi
*192.168.100.182 and 183 another PMX host

PMX host A

CAT-ing out the conf of ESXi

ESXI switch parameters

ESXi Firewall settings
root@localhost:~] esxcli network firewall ruleset list
Name Enabled
------------------------ -------
sshServer true
sshClient false
nfsClient false
nfs41Client false
dhcp true
dns true
snmp true
ntpClient false
CIMHttpServer true
CIMHttpsServer true
CIMSLP true
iSCSI false
vpxHeartbeats true
updateManager true
faultTolerance true
webAccess true
vMotion true
vSphereClient true
activeDirectoryAll false
NFC true
HBR true
ftpClient false
httpClient false
gdbserver false
DVFilter false
DHCPv6 false
DVSSync true
syslog false
IKED false
WOL true
vSPC false
remoteSerialPort false
vprobeServer false
rdt true
cmmds true
vsanvp true
rabbitmqproxy true
ipfam false
vvold false
iofiltervp false
esxupdate false
vit false
vsanhealth-multicasttest false

Rest of the test unnecessary because having no ARP resolution any kinda trafic will not work.
OSI Layer 3

ICMP (host 192.168.100.181 => ESXi 192.168.100.165)
Echo request only
root@pve02:~# tcpdump -envi vmbr0 icmp
tcpdump: listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
11:37:27.554354 94:57:a5:6b:20:3c > 86:8b:01:5d:2e:7d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 54705, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.100.181 > 192.168.100.165: ICMP echo request, id 61436, seq 1, length 64
11:37:28.558443 94:57:a5:6b:20:3c > 86:8b:01:5d:2e:7d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 55158, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.100.181 > 192.168.100.165: ICMP echo request, id 61436, seq 2, length 64
11:37:30.707025 94:57:a5:6b:20:3c > 86:8b:01:5d:2e:7d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 56095, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.100.181 > 192.168.100.165: ICMP echo request, id 61436, seq 3, length 64
11:37:30.707034 94:57:a5:6b:20:3c > 86:8b:01:5d:2e:7d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 56475, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.100.181 > 192.168.100.165: ICMP echo request, id 61436, seq 4, length 64
11:37:31.630469 94:57:a5:6b:20:3c > 86:8b:01:5d:2e:7d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 57399, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.100.181 > 192.168.100.165: ICMP echo request, id 61436, seq 5, length 64
11:37:32.654483 94:57:a5:6b:20:3c > 86:8b:01:5d:2e:7d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 57446, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.100.181 > 192.168.100.165: ICMP echo request, id 61436, seq 6, length 64
11:37:33.678479 94:57:a5:6b:20:3c > 86:8b:01:5d:2e:7d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58174, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.100.181 > 192.168.100.165: ICMP echo request, id 61436, seq 7, length 64
11:37:34.702436 94:57:a5:6b:20:3c > 86:8b:01:5d:2e:7d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58441, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.100.181 > 192.168.100.165: ICMP echo request, id 61436, seq 8, length 64
11:37:35.726472 94:57:a5:6b:20:3c > 86:8b:01:5d:2e:7d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58863, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.100.181 > 192.168.100.165: ICMP echo request, id 61436, seq 9, length 64
11:37:36.750468 94:57:a5:6b:20:3c > 86:8b:01:5d:2e:7d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58969, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.100.181 > 192.168.100.165: ICMP echo request, id 61436, seq 10, length 64

Ping from ESXi (192.168.100.165) to PMX Host A (192.168.100.181)
(SSH session)
PING 192.168.100.181 (192.168.100.181): 56 data bytes
sendto() failed (Host is down)

Ping from ESXi (192.168.100.165) to an ESXi hosted VM
[root@localhost:~] ping 192.168.100.182
PING 192.168.100.182 (192.168.100.182): 56 data bytes
64 bytes from 192.168.100.182: icmp_seq=0 ttl=64 time=0.350 ms

Search

Search

ESXi to Proxmox - Network issue - ProxMox Bridge prohibit communication

Urbanovits

Member

dakralex

Proxmox Staff Member

Urbanovits

Member

dakralex

Proxmox Staff Member

Urbanovits

Member

dakralex

Proxmox Staff Member

Urbanovits

Member

Attachments