Giving VM access to Ceph public network

max.w · Oct 28, 2024

Hello,

this whole shebang has already taken me nearly a week to try to figure out, but I'm still stuck.

Background:
I want to spin up a 5 VM Docker Swarm cluster. They're meant to be using keepaliveD for HA/Failover using a simple SDN. That seems to be working OK.
Now I "only" need peristent storage. Originally the plan was to create a 6th VM with a extra large disk and cockpit to manage NFS shares. As I don't want to keep another VM up-to-date the new plan was to create a CephFS and mount it into the VMs.

Our network on the pve-nodes looks like this:

The public Ceph-network runs on bond900. According to the documentation our, now former, colleague wrote the network uses a VLAN tag (2160).
So I created a Linux VLAN (bond900.2160) and then a bridge (vmbr900).

Here the configuration according to the /etc/network/interfaces file of one of our Proxmox nodes:

Code:

auto lo
iface lo inet loopback

iface eno3 inet manual

auto enp134s0f0
iface enp134s0f0 inet manual
#bond900

auto enp134s0f1
iface enp134s0f1 inet manual
#bond901

auto enp175s0f0
iface enp175s0f0 inet manual
#bond900

auto enp175s0f1
iface enp175s0f1 inet manual
#bond901

auto enp216s0f0
iface enp216s0f0 inet manual
#bond0

auto enp216s0f1
iface enp216s0f1 inet static
    address 10.16.1.201/24
#Corosync Ring1

iface eno1 inet manual

iface eno2 inet manual

auto enp24s0f0
iface enp24s0f0 inet manual
#bond0

auto enp24s0f1
iface enp24s0f1 inet static
    address 10.16.0.201/24
#Corosync Ring0

iface eno4 inet manual

iface eno5 inet manual

iface eno6 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves enp216s0f0 enp24s0f0
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2
#  Net

auto bond900
iface bond900 inet static
    address 10.16.10.201/24
    bond-slaves enp134s0f0 enp175s0f0
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
#CEPH public

auto bond901
iface bond901 inet static
    address 10.16.11.201/24
    bond-slaves enp134s0f1 enp175s0f1
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
#CEPH private

auto bond0.15
iface bond0.15 inet manual
#  Gast

auto bond0.22
iface bond0.22 inet manual
#  VOIP FN

auto bond0.921
iface bond0.921 inet manual
#  DMZ

auto bond900.2160
iface bond900.2160 inet manual

auto vmbr0
iface vmbr0 inet static
    address 172.16.15.201/16
    gateway 172.16.0.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
#  Intern

auto vmbr1
iface vmbr1 inet manual
    bridge-ports bond0.15
    bridge-stp off
    bridge-fd 0
#  Gast

auto vmbr2
iface vmbr2 inet manual
    bridge-ports bond0.22
    bridge-stp off
    bridge-fd 0
#  VOIP FN

auto vmbr3
iface vmbr3 inet manual
    bridge-ports bond0.921
    bridge-stp off
    bridge-fd 0
#  DMZ

auto vmbr900
iface vmbr900 inet manual
    bridge-ports bond900.2160
    bridge-stp off
    bridge-fd 0
#CEPH Public

source /etc/network/interfaces.d/*

Here the sdn file:

Code:

#version:12

auto keepnet
iface keepnet
    address 192.168.0.1/29
    bridge_ports none
    bridge_stp off
    bridge_fd 0
    alias DockerKeepaliveD
    ip-forward on

And, lastly, the /etc/ceph/ceph.config fle:

Code:

[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.16.11.201/24
     fsid = aea7d06a-18ce-4e6c-9381-ad31953d6717
     mon_allow_pool_delete = true
     mon_host = 10.16.10.201 10.16.10.202 10.16.10.203
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.16.10.201/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pvefn01-dockerCephFS]
     host = pvefn01
     mds_standby_for_name = pve

[mds.pvefn02-dockerCephFS]
     host = pvefn02
     mds_standby_for_name = pve

[mds.pvefn03-dockerCephFS]
     host = pvefn03
     mds_standby_for_name = pve

[mon.pvefn01]
     public_addr = 10.16.10.201

[mon.pvefn02]
     public_addr = 10.16.10.202

[mon.pvefn03]
     public_addr = 10.16.10.203

Now, I did, what I found in other Threads. I copied the keyfile onto the docker node and tried to mount the cephFS with

mount.ceph admin@<storage-id>.cephfs=/ /mnt/dockerFS -o 'secretfile=/etc/ceph/admin.keyring,mon_addr=10.16.10.201:6789/10.16.10.202:6789/10.16.10.203:6789'

I get the error: mount error: no mds (Metadata Server) is up. The cluster might be laggy, or you may not be authorized

Ping to the public cluster address also doesn't work ("From 10.16.10.101 [...] Destination Host Unreachable")

Strangely enough, I can ping the address 10.16.10.201 when I use specifically the device of the SDN-network:
~$ ping 10.16.10.201 -I ens19
PING 10.16.10.201 (10.16.10.201) from 192.168.0.2 ens19: 56(84) bytes of data.
64 bytes from 10.16.10.201: icmp_seq=10 ttl=64 time=0.264 ms
64 bytes from 10.16.10.201: icmp_seq=11 ttl=64 time=0.137 ms
64 bytes from 10.16.10.201: icmp_seq=12 ttl=64 time=0.123 ms
64 bytes from 10.16.10.201: icmp_seq=13 ttl=64 time=0.202 ms
^C
--- 10.16.10.201 ping statistics ---
13 packets transmitted, 4 received, 69.2308% packet loss, time 12281ms
rtt min/avg/max/mdev = 0.123/0.181/0.264/0.056 ms

Currently I'm out of ideas. Does someone here have an idea?

dherzig · Oct 30, 2024

Hi max.w!

I get the error: mount error: no mds (Metadata Server) is up. The cluster might be laggy, or you may not be authorized

This might pretty much mean what it says (provided client key not valid for using the cephfs). As ceph permissions can be tricky it can be helpful to use a step-to-step approach for debugging.

From pve node -- make sure that the desired ceph filesystem exists (this should return a line containing `name: $NAME_OF_YOUR_CEPHFS [...]`):

Code:

ceph fs ls

From (each) client VM (it might be a good idea to check that ceph-common matches the version of ceph you are using on your cluster [0]). Get the minimal cluster config (basically just the fsid of your ceph cluster and the addresses of the monitors) and a dedicated client key authorized to just access the cephfs you want to hand over to the swarm VMs (read up here [1] for details). Putting these files into the common places will shorten down the ceph commands you have to issue!

Code:

mkdir /etc/ceph
chmod 755 /etc/ceph
ssh {user}@{mon-host} "ceph config generate-minimal-conf" | sudo tee /etc/ceph/ceph.conf
chmod 644 /etc/ceph/ceph.conf
ssh {user}@{mon-host} "ceph fs authorize $NAME_OF_YOUR_CEPHFS client.$NAME_FOR_YOUR_CLIENT / rw" | sudo tee /etc/ceph/ceph.client.$NAME_FOR_YOUR_CLIENT.keyring

At this point you should be able to run the following commands from your client VM:

Code:

ceph status --id $NAME_FOR_YOUR_CLIENT
ceph fs ls --id $NAME_FOR_YOUR_CLIENT

If these succeed with expected results you should be good running (without adding additional parameters):

Code:

mount.ceph $NAME_FOR_YOUR_CLIENT@.$NAME_OF_YOUR_CEPHFS=/ $YOUR_MOUNTPOINT

[0] https://docs.ceph.com/en/latest/install/get-packages/#install-packages-with-cephadm
[1] https://docs.ceph.com/en/squid/cephfs/mount-prerequisites/#which-cephfs-client

max.w · Oct 30, 2024

Hi dherzig, thanks for the reply!

I already did most of that before, but just to be sure I tried it again but to no avail. Same error as before.

Is it problematic if the ceph-common packages of the VMs are newer (19.2.0-0ubuntu0.24.04.1) than the ceph of the host (17.2.7-pve3)?

I'm at the point where I'm thinking of either hooking one of the onboard ports (en*) up to our pve-switch and try it with that, or taking the IP address away from the bond900 so that I can configure the network like bond0.
Would there any problems if I do the latter? It's something I haven't really found something on. I'd rather not have downtime on the proxmox as this is our productive system...

dherzig · Oct 31, 2024

I do not think that connecting squid ceph-common to quincy ceph is a problem (just tried here, it seems to work the same as squid to squid).

So it's networking or permissions likely. Just for a double-check -- could you please post the output of `ceph status` and `ceph fs ls` from your client machine?

waltar · Oct 31, 2024

max.w said:
[global] ... cluster_network = 10.16.11.201/24 ... mon_host = 10.16.10.201 10.16.10.202 10.16.10.203 ...

network 11 vs 10 ?

max.w · Nov 4, 2024

Sorry for the late answer, I was on a short vacation.

@dherzig

If you mean the VM:

ceph status just hangs. It most likely can't connect to the cluster. Same with ceph fs ls.

parsed_args: Namespace(completion=False, help=False, cephconf=None, input_file=None, output_file=None, setuser=None, setgroup=None, client_id=None, client_name=None, cluster=None, admin_socket=None, status=False, watch=False, watch_debug=False, watch_info=False, watch_sec=False, watch_warn=False, watch_error=False, watch_channel=None, version=False, verbose=True, output_format=None, cluster_timeout=None, block=False, period=1), childargs: ['status']
2024-11-04T08:37:19.103+0000 7c01df4006c0 0 monclient(hunting): authenticate timed out after 300
[errno 110] RADOS timed out (error connecting to the cluster)

On the pve-hosts, of course, everything works:

Code:

 cluster:
    id:     aea7d06a-18ce-4e6c-9381-ad31953d6717
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pvefn01,pvefn02,pvefn03 (age 4d)
    mgr: pvefn03(active, since 10M), standbys: pvefn02, pvefn01
    mds: 1/1 daemons up, 2 standby
    osd: 39 osds: 39 up (since 7w), 39 in (since 7w)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 1073 pgs
    objects: 4.75M objects, 18 TiB
    usage:   51 TiB used, 85 TiB / 136 TiB avail
    pgs:     1072 active+clean
             1    active+clean+scrubbing+deep

name: dockerCephFS, metadata pool: dockerCephFS_metadata, data pools: [dockerCephFS_data ]

@waltar:
The private and public ceph networks have been divided up. The cluster network address points to the private ceph network.

dherzig · Nov 4, 2024

Hi!

Does it help, if you add vmbr900 as an additional network interface to you VMs (and assign them unused IPs in that network)?

max.w · Nov 4, 2024

That's...what I have been doing from the beginning? I think I forgot to write that in my OP.
I thought that was how it should be done, anyways: Create a bridge (vmbr900) to the vlan on the the bond (bond900.2160) and add it as a new network device to the VM.

Maybe (one of) the problem(s) is that the vlan tag is given on the physical switch connecting the different pve hosts and not in Proxmox itself?

dherzig · Nov 4, 2024

OK -- thanks for the clarification, and sorry for my sloppy reading!

I guess that:

I'm at the point where I'm thinking of either hooking one of the onboard ports (en*) up to our pve-switch and try it with that, or taking the IP address away from the bond900 so that I can configure the network like bond0.

the second option might just do (remove the bond900.2160 linux vlan, remove the vlan tag from bond900 on vmbr900, and move the IP address to vmbr900). Just give it a try on a single host and check that it doesn't affect other services before applying the updated network configuration to the remaining pve hosts!

Giving VM access to Ceph public network

max.w

New Member

dherzig

Proxmox Staff Member

max.w

New Member

dherzig

Proxmox Staff Member

waltar

Famous Member

max.w

New Member

dherzig

Proxmox Staff Member

max.w

New Member

dherzig

Proxmox Staff Member

We value your privacy