Environment
- Proxmox VE cluster with 2 nodes (node94: 10.129.56.94, node107: 10.129.56.107)
- Ceph cluster running on the Proxmox nodes (public_network: 10.129.56.0/24)
- Proxmox SDN EVPN zone (madp) for VM networking
- VMs are on the EVPN overlay network: 172.16.0.0/16
- Goal: Configure Rook External Ceph so Kubernetes VMs can access Ceph
Network topology
VMs (172.16.0.x) ── EVPN overlay (madp) ── Proxmox nodes (10.129.56.x) ── Ceph<br>
Problem
VMs can reach the Proxmox nodes (underlay) through the EVPN gateway, but Ceph cannot reach back to the VMs because:
Guess:
- The EVPN zone (madp) isolates underlay ↔ overlay traffic by design(it's unable to ping on vrf too)
- Return routing from Ceph (10.129.56.x) to VMs (172.16.0.x) is blocked by the VRF isolation in EVPN
Code:
ubuntu@k8s-dev-101:~$ ping 10.129.56.107PING 10.129.56.107 (10.129.56.107) 56(84) bytes of data.
64 bytes from 10.129.56.107: icmp_seq=1 ttl=63 time=0.259 ms
^C
--- 10.129.56.107 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.259/0.259/0.259/0.000 ms
ubuntu@k8s-dev-101:~$ ping 10.129.56.94
PING 10.129.56.94 (10.129.56.94) 56(84) bytes of data.
64 bytes from 10.129.56.94: icmp_seq=1 ttl=62 time=0.312 ms
^C
--- 10.129.56.94 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.312/0.312/0.312/0.000 ms
What I have tried
- Adding 172.16.0.0/16 to Ceph public_network
- Added via ceph config set global public_network "10.129.56.0/24, 172.16.0.0/16"
- Failed because nodes don't have a 172.16.0.x IP, causing OSD bind failure
- Adding overlay IP to nodes (ip addr add 172.16.0.107/16 dev vmbr0)
- Not possible due to network constraints
- NAT/MASQUERADE on nodes
- iptables -t nat -A POSTROUTING -s 172.16.0.0/16 -o vmbr0 -j MASQUERADE
- Did not work
- Creating a separate VLAN bridge (vmbr20) for node ↔ VM communication
- Works within the same node, but inter-node VM communication does not work
- Physical switch ports are in access mode (cannot be changed), so VLAN trunk is not possible
- SDN EVPN gateway IP approach
- Already tried assigning subnet gateway IP in SDN VNet and adding it to Ceph public_network
- Did not work
Questions
- What is the recommended way to allow Ceph (on the underlay) to communicate with VMs (on the EVPN overlay) for Rook External Ceph?
- Is there a way to configure the EVPN VRF to allow specific underlay ↔ overlay routing without breaking the isolation entirely?
- Is there a way to add a routable IP on the nodes that bridges both the underlay and overlay networks so Ceph can reach VMs?