Give VMs client access to 3node full mesh Ceph

Jan 13, 2021
14
0
6
Hi,

I have set up a fully meshed 3 node cluster with ceph. (like described here).
Now I would like my VMs to have access to CephFS.

By adding a bridge with IP to my VM I could ping the local Ceph IP, but not the remote ones. I have ip_forwarding enabled. Seems like I'm missing some routes.
Is that approach feasible? What would be the best (as in stable/performant) option here?
 
@converged , from a networking standpoint, this is 100% feasible. This sounds like a basic networking issue, but I'd need far more details.

Can you post the following, so I can get an idea of your configuration:
/etc/network/interfaces
vmid.conf of the VM
network configuration of the guest VM


Also, ip_forwarding in the kernel shouldn't need to be manually set for this.
 
Here is the relevant excerpt /etc/network/interfaces of node1. Node 2 and 3 have their IPs changed accordingly.
# connected to node2 (.12.2) auto enp129s0f0 iface enp129s0f0 inet static address 10.0.12.1/24 mtu 9000 up ip route add 10.0.12.2/32 dev enp129s0f0 down ip route del 10.0.12.2/32 # connected to node3 (.12.3) auto enp129s0f1 iface enp129s0f1 inet static address 10.0.12.1/24 mtu 9000 up ip route add 10.0.12.3/32 dev enp129s0f1 down ip route del 10.0.12.3/32 auto vmbr1 iface vmbr1 inet static address 10.0.13.1/24 bridge-ports none bridge-stp off bridge-fd 0

enp129s0f0 and enp129s0f1 are the interfaces used for the mesh
vmbr1 should be an isolated bridge used for accessing the ceph cluster.

excerpt of ip route list on node1
default via 10.0.0.1 dev vmbr0 proto kernel onlink 10.0.0.0/24 dev vmbr0 proto kernel scope link src 10.0.0.154 10.0.12.0/24 dev enp129s0f0 proto kernel scope link src 10.0.12.1 10.0.12.0/24 dev enp129s0f1 proto kernel scope link src 10.0.12.1 10.0.12.2 dev enp129s0f0 scope link 10.0.12.3 dev enp129s0f1 scope link 10.0.13.0/24 dev vmbr1 proto kernel scope link src 10.0.13.1

network config in vmid.conf
net0: virtio=EA:AD:56:AD:02:C4,bridge=vmbr1


The guest has a static ip set to 10.0.13.10 with default gw 10.0.13.1
ifconfig ens18
ens18: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.13.10 netmask 255.255.255.0 broadcast 10.0.13.255 inet6 fe80::e8ad:56ff:fead:2c4 prefixlen 64 scopeid 0x20<link> ether ea:ad:56:ad:02:c4 txqueuelen 1000 (Ethernet) RX packets 107 bytes 12962 (12.9 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 111 bytes 18148 (18.1 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

route
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default _gateway 0.0.0.0 UG 0 0 0 ens18 10.0.13.0 0.0.0.0 255.255.255.0 U 0 0 0 ens18
 
Last edited:
Here is the relevant excerpt /etc/network/interfaces of node1. Node 2 and 3 have their IPs changed accordingly.
# connected to node2 (.12.2) auto enp129s0f0 iface enp129s0f0 inet static address 10.0.12.1/24 mtu 9000 up ip route add 10.0.12.2/32 dev enp129s0f0 down ip route del 10.0.12.2/32 # connected to node3 (.12.3) auto enp129s0f1 iface enp129s0f1 inet static address 10.0.12.1/24 mtu 9000 up ip route add 10.0.12.3/32 dev enp129s0f1 down ip route del 10.0.12.3/32 auto vmbr1 iface vmbr1 inet static address 10.0.13.1/24 bridge-ports none bridge-stp off bridge-fd 0

enp129s0f0 and enp129s0f1 are the interfaces used for the mesh
vmbr1 should be an isolated bridge used for accessing the ceph cluster.

excerpt of ip route list on node1
default via 10.0.0.1 dev vmbr0 proto kernel onlink 10.0.0.0/24 dev vmbr0 proto kernel scope link src 10.0.0.154 10.0.12.0/24 dev enp129s0f0 proto kernel scope link src 10.0.12.1 10.0.12.0/24 dev enp129s0f1 proto kernel scope link src 10.0.12.1 10.0.12.2 dev enp129s0f0 scope link 10.0.12.3 dev enp129s0f1 scope link 10.0.13.0/24 dev vmbr1 proto kernel scope link src 10.0.13.1

network config in vmid.conf
net0: virtio=EA:AD:56:AD:02:C4,bridge=vmbr1


The guest has a static ip set to 10.0.13.10 with default gw 10.0.13.1
ifconfig ens18
ens18: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.13.10 netmask 255.255.255.0 broadcast 10.0.13.255 inet6 fe80::e8ad:56ff:fead:2c4 prefixlen 64 scopeid 0x20<link> ether ea:ad:56:ad:02:c4 txqueuelen 1000 (Ethernet) RX packets 107 bytes 12962 (12.9 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 111 bytes 18148 (18.1 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

route
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default _gateway 0.0.0.0 UG 0 0 0 ens18 10.0.13.0 0.0.0.0 255.255.255.0 U 0 0 0 ens18
In that case, enp129s0f1(node3), enp129s0f0(node2) should be slave of vmbr1 and the ip should be configured on the vmbr1.

Right now, your bridge vmbr1 is not connected to any network interface. How could it communicate with the outside of your server ?
in this situation VMa(on node2) and VMb(on node3), both connected to their local vmbr1 cannot communicate.
 
In that case, enp129s0f1(node3), enp129s0f0(node2) should be slave of vmbr1 and the ip should be configured on the vmbr1.

Right now, your bridge vmbr1 is not connected to any network interface. How could it communicate with the outside of your server ?
in this situation VMa(on node2) and VMb(on node3), both connected to their local vmbr1 cannot communicate.

I already tried this, but it leads to another strange phenomenon.

interfaces looks like
# connected to node2 (.12.2) auto enp129s0f0 iface enp129s0f0 inet static # address 10.0.12.1/24 mtu 9000 up ip route add 10.0.12.2/32 dev enp129s0f0 down ip route del 10.0.12.2/32 # connected to node3 (.12.3) auto enp129s0f1 iface enp129s0f1 inet static # address 10.0.12.1/24 mtu 9000 up ip route add 10.0.12.3/32 dev enp129s0f1 down ip route del 10.0.12.3/32 auto vmbr1 iface vmbr1 inet static address 10.0.12.1/24 mtu 9000 bridge-ports enp129s0f0 enp129s0f1 bridge-stp off bridge-fd 0

IP of the VM ist 10.0.12.10

On node1 I can ping node2, node3 and the VM successfully

On the VM I can ping the node1 (10.0.12.1) and node3 (10.0.12.3), but not node2 (10.0.12.2)
Can somebody explain this?

I'd prefer to have the mesh as in the first example and have a separate network for the VMs which gets routed to the ceph nodes
 
Last edited:
There seems to be a basic misunderstanding of networking going on here.

First, you can't have multiple interfaces in the same broadcast domain have the same IP. It breaks layers 2 and 3 of networking. That's why the config you posted in the above doesn't work.

Lokytech is half right - vmbr0 has no interface attached to it, and so it will never be able to pass traffic directly out of the proxmox node it sits on - it will only be able to route traffic internally to another interface. Which is what you're trying to do - but this bring me to the below.

Even if you did have an interface attached to vmbr0, you would still have routing issues. See this post for an explanation, but essentially you have multiple routes to the same subnet with no weight attached to the routes -- and 3 routes each to 10.0.12.{2,3}. Even if you did have a weight attached, traffic for a subnet can only be sent to one route at a time. Period.

If this is the layer 1 config you absolutely need to work, you need to change the CIDRs of enp129s0f1 and enp129s0f0 to 10.0.12.x/32, and delete the lines where you add additional routes manually. Your Proxmox node (and VMs) will only be able to communiate with 3 exterior IPs - 10.0.12.{1-3}. This means no access to the rest of your network, no access to the internet, and unless you do some a lot more manual routing and subnetting, VMs in different nodes will not be able to communicate with each other.

If you have a managed switch, or 1 or 2 unmanaged switches, we can make this much easier for you by using trunks, vlans, and vlan interfaces. This is exactly what switches are for.
 
The config in the first post does work for the proxmox nodes, please take a look at: https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server

I now figured out a way to have the VMs access the ceph ips.

Adding vmbr1 with 10.0.13.1/24 and the following rules on the nodes did the trick:
Code:
echo 1 > /proc/sys/net/ipv4/ip_forward

iptables -t nat -A POSTROUTING -o enp129s0f0 -j MASQUERADE
iptables -t nat -A POSTROUTING -o enp129s0f1 -j MASQUERADE

I don't know this is an ideal solution, but it works.
 
Glad you got it to work!

I didn't understand why the routing worked, though, until I realized I made a stupid mistake. Routing always using the more specific path first. Remember that, folks!
 
  • Like
Reactions: Lokytech

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!